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RECENT PRODUCTIVITY TRENDS AND THEIR 
IMPLICATIONS* 


W. D. Evans 
U. S. Bureau of Labor Statistics 


For purposes of measurement, labor productivity may be 
defined in terms of the relative numbers of man-hours required 
to produce equal physical quantities of goods at two different 
periods of time. It may also be defined in terms of the values 
of goods preduced per man-hour in different periods assuming 
that changes in prices have not occurred. The two concepts 
are not equivalent when groups of products or industries are 
to be considered. The latter measure, unlike the former, may 
rise or fall even when output per man-hour remains unchanged 
for every component product. The difference in measurement 
is especially important during periods, such as the present one, 
when there are substantial shifts in the composition of produc- 
tion. 

Changes in output per man-hour varied widely from indus- 
try to industry during the war period, reflecting differing ex- 
periences with respect to organization, material supplies, man- 
power, equipment installations and scale of production. In 
general, the nonmunitions industries were less favorably situ- 
ated than the munitions group. The post-reconversion period 
should be marked by unusually rapid increases in output per 
man-hour for many manufacturing industries as there is com- 
pensation for wartime restrictions. The expected charges in 
output per man-hour will have important implications with 
respect to real wages, prices, profits, employment levels, stand- 
ards of living, and the position of the United States in interna- 
tional trade. 


HE subject of productivity change is today receiving a great deal 
Te thought and attention, and with respect to some phases of the 
problem, a certain amount of controversy has developed. There is 
general agreement about the size of productivity increases which oc- 
curred prior to 1941. In the period between the wars, most industries 


* A paper presented at the 105th Annual Meeting of the American Statistical Association, Cleve- 
land, Ohio, January 25, 1946. 
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achieved large increases in man-hour output, although the rate of ad- 
vance did vary from industry to industry. It is well accepted that the 
primary factor underlying these long-term gains was increasing tech- 
nical knowledge, and the application of this knowledge to industry. 
Better processes, better machines, better handling of materials, better 
organization of jobs—all contributed to the steady improvement of 
productive efficiency. 

After 1941, there is no general agreement as to the facts. In part, 
this is probably because the pattern of change was not consistent from 
industry to industry. Tremendous increases in output per man-hour 
occurred in some industries; in others, there was little change during 
the war period; and in a few, there were actual declines. It is, therefore, 
not surprising that generalizations concerning wartime changes in 
productivity vary according to the interest and experience of the per- 
son making the generalization. 

A very important reason for lack of agreement is uncertainty regard- 
ing just what we are attempting to measure under the heading “‘pro- 
ductivity.” In some cases, different observers have measured funda- 
mentally different quantities, but each has labeled his own result as 
“productivity.” The ensuing lack of agreement in figures in some cases 
has led improperly to arguments, rather than, properly, to an examina- 
tion of fundamental concepts. Accordingly, I think a short space may be 
devoted profitably to an examination of just what it is we should like 
to measure. 

To the Bureau of Labor Statistics, productivity has meant the 
measurement of productive efficiency, using the expenditure of human 
effort as a yardstick. We are interested in determining whether a given 
job takes more or less labor over a period of time. From the very first 
studies of productivity, production has been defined wherever possible 
in direct physical terms, and the expenditure of labor in terms of man- 
hours. Thus, one of the earliest large-scale studies of productivity, the 
monumental two-volume report of the Commissioner of Labor Statis- 
tics for 1898, measures the specific number of man-hours required to 
produce each of hundreds of separate items using alternative meth- 
ods of production. 

So long as they concern only one specific product at a time state- 
ments regarding productivity remain relatively clear and unambigu- 
ous. No confusion of meaning is possible when it is said that the produc- 
tion of one thousand five-cent long-filler cigars by hand requires about 
33 man-hours, or that the cement industry expended about four-tenths 
of a man-hour to produce a barrel of cement just before the war. Diffi- 
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culties begin to arise when statements about productivity cover more 
than one product. 

Most industries typically produce a variety of separate products. 
For most industries, however, there is only one measure of labor input, 
a total for the industry. To measure productivity in such an indus- 
try, some composite measure of total production is obviously needed. 
It is well known that it is always possible to prepare more than one 
production measure where a composite of many items is to be covered. 
The problem is to select a measure which is satisfactory for the pur- 
poses of computing productivity. 

This problem may be approached in several different ways. Fortu- 
nately, they lead to the same answer. One method is as follows: it is 
clear that we can make a statement about the number of man-hours 
required to produce a barrel of cement in two different years, and so 
make a statement about productivity changes. It is equally possible to 
make similar statements regarding a ton of pig iron or a thousand cig- 
arettes. Suppose, however, that we wish to make an equivalent state- 
ment regarding all of these heterogeneous commodities considered to 
gether. This is clearly possible if the comparison is based on some fixed 
market basket of goods. Arbitrarily, this might consist of 5,000 barrels 
of cement, 5,000 cartons of cigarettes and 50 tons of pig iron. The 
number of man-hours which would be required to produce this market 
basket of goods in the first year could be computed, and a similar 
computation made for the second year. If the number of man-hours 
had declined over the period, it would be clear that productivity might 
be said to have increased in proportion. The question is, what market 
basket of goods should be selected for making comparisons? The 
obvious choice is the actual production of goods in some base year or 
reference year. Once this choice is made, the form of the production 
measure to be adopted is fixed. Without going through the technical 
details, it turns out to be an index of production in which the physical 
amounts of the separate items produced are weighted by their unit 
labor requirements. 

The problem may be approached in a somewhat different way. 
We may consider the entire class consisting of all the separate produc- 
tion measures which might be computed, and set up a criterion for the 
selection of one to be used in the measurement of productivity. A very 
reasonable criterion would appear to be the following: if physical out- 
put per man-hour remains unchanged over a set period for all the sepa- 


1 See W. D. Evans and I. H. Siegel, Journal of the American Statistical Association, March 1942, 
pp. 103-111. 
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rate items to be covered by the productivity index, that production 
index will be selected which will leave the composite productivity index 
also unchanged. In effect, it is required that the productivity index for 
a number of products should be some form of average of the productiv- 
ity changes for the separate commodities covered. There is only one 
production measure which will meet the criterion we have specified, 
and it is the same as the one we arrived at previously in a somewhat 
different manner.’ 

The appropriateness of various production measures for the pur- 
pose of making statements about productive efficiency is an important 
topic, because a proper discrimination between different types of meas- 
ures is not always employed. For example, there have been several 
recent instances where gross national product has been divided by 
measures of man-hour input to show that during the war period gross 
national product per man-hour has increased. This is an observation 
of economic importance, but its value is diminished if the observer in- 
terprets the increase to be due solely to increases in productive effi- 
ciency. Yet this interpretation has been made. 

Gross national product is an involved and complex statistic designed 
to measure, in a special sense, the total national output. It is useful 
for this purpose, but not for making unqualified statements about pro- 
ductive efficiency. Conceptually, GNP is built up in the following man- 
ner. The production of an industry or branch of the economy is repre- 
sented by the total value of its output, minus its purchases from other 
segments. The subtraction of materials costs, for example, is necessary 
to prevent this item from being counted more than once; that is, in the 
output of finished-goods industries and in the output of industries sup- 
plying raw and semi-finished materials as well. The gross value added 
figures for each branch of the economy may be expressed in the prices 
of some reference year to prevent price fluctuations from influencing 
the results. If the gross value added figures so obtained are now added 
for all parts of the economy, the result, with minor statistical qualifi- 
cations, is the gross national product for the specified year, expressed 
in the prices of the reference year. 

Now it is a simple fact that the gross value added per worker or 


2 Technically, it is required that the production and man-hour indexes be equal in a period when all 
unit labor requirements are the same as in the base period. That is, using subscripts to designate the 
period and letting ¢ represent production, r unit labor requirements, and w the unspecified weights in 
an aggregative index of production, we have 

Zwa Zréaé 


when rg = re. 








Zw Zrege 
This will hold true for all possible values of gg and ge only when the weights w are proportionate 
either to rg or to re. 
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per man-hour varies substantially from segment to segment of the 
economy. In 1939, value added by manufacture per worker (which is 
numerically close to gross value added per worker) ranged from less 
than $1,000 in a number of the garment industries to more than $6,500 
in the petroleum refining and chemicals industries. This means that a 
shift in the pattern of production can substantially change the value of 
GNP per worker for the economy without any change in physical 
output per worker having occurred in any industry. To elaborate this 
key fact, suppose that in two separate years prices, average working 
hours, and output per worker in each industry are identical, and that 
total employment is the same. If, however, in the later year more per- 
sons are employed in high value-added industries and fewer in low 
value-added industries, there will be an increase in GNP per worker. 
This increase will have no bearing at all on changes in industrial effi- 
ciency, since these are by definition ruled out. 

It is very clear that during the war period a violent alteration in the 
production pattern did occur. It is also clear that the direction of the 
shift was toward the production of items where gross value added per 
worker is high. This shift in the composition of production in part ex- 
plains the increase in gross national product per man-hour. For this 
reason, it is dangerous to label the change as an increase in productivity. 
To some extent the shifts in the production pattern have already re- 
versed themselves, and it is almost certain that gross national product 
per worker will decline from the wartime peak. If we wish to interpret 
the increase as an improvement in productive efficiency, we must be 
prepared to interpret the decline which follows cessation of war produc- 
tion as a loss in productive efficiency. 

There is no implication in the above that gross national product per 
man-hour is not for some purposes a very useful measure. I am simply 
saying that we must remember that it is affected by several important 
factors, among them productive efficiency and changes in the product 
pattern of the economy. For some purposes, we will wish to separate 
these factors and study them individually. 

These and other technical difficulties have limited the number of 
industries for which proper productivity measures could be prepared. 
During the wartime period, there has also been a scarcity in the neces- 
sary fundamental statistical data. I will only mention that the Census 
of Manufactures, one of the most useful sources of information for 
preparing productivity indexes, has not been taken since 1939. Even 
where some information has been available, partial conversion of indus- 
tries to war production has introduced noncomparabilities. The Bureau 
of Labor Statistics has maintained its long-term productivity indexes 
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during the war period for only 24 separate manufacturing industries. 
Most of these industries are such that their product patterns were 
little affected by wartime necessities; in other words, they produced 
primarily for civilian consumption. 

There is reason to expect that substantial increases in productivity 
will occur in the next few years, and that these increases will provide 
the foundation for general wage increases and a broad program of eco- 
nomic betterment, as they have in the past. Accordingly, accurate, 
comprehensive and up-to-date information on productivity in the 
coming period will be more important than ever before. There is no 
immediate possibility of expanding the scope of index measures, nor 
does it appear that the answer lies in the detailed field studies of par- 
ticular industries which have been made in the past. These studies, 
while immensely useful, are extremely time consuming, and only a 
limited number can be made. The Bureau is therefore trying a new 
method of approach. We are attempting to secure periodic reports on 
unit labor requirements directly from manufacturers. Let me describe 
this program briefly. 

After choosing an industry for study, a group of products, which may 
range from 15 to 50 in number, is selected to represent the industry. 
Careful specifications are drawn up for each product, and a sample of 
plants manufacturing the product is chosen. From each of these plants, 
the Bureau will attempt to secure, by direct field contact, a report on 
the average number of man-hours required currently to make the speci- 
fied item. We will also make arrangements to secure equivalent reports 
in the future at regular intervals. After combining the results, to insure 
that reports for no single plant can be identified, there will be figures 
to show currently the course of labor productivity within an industry 
and which will permit some detail by product. 

This procedure has been tried experimentally in the last six months 
and the results seem very promising. Funds have been made available 
by the Congress to initiate an expanded program. With the guidance 
and cooperation of both labor and management, this new program will 
be launched in the near future. The new approach offers one enormous 
advantage over the conventional indexes. It makes the measurement of 
productivity independent of secondary source materials. 

But let us leave plans for the future and see what the information 
we now have seems to suggest. Brief reference has already been made 
to the substantial gains which occurred in most industries prior to 
1941. The main interest, of course, is rather in recent and prospective 
developments. 

Let us first consider manufacturing, which constitutes the largest 
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single group and the one concerning which there has been the most 
controversy. In the first place, I do not know of any meaningful way 
to prepare any productivity statistics covering manufacturing as a 
whole for the war period. It is not possible to evaluate the production 
of superbombers, tanks, and guns in the same terms as electric refriger- 
ators, automobiles, and electric fans. This type of difficulty is always 
present in the preparation of productivity statistics covering a large 
group of industries, even in peacetime, for new products are constantly 
introduced. In normal times, however, the changes occur slowly and 
they do not, over reasonable periods, materially affect the statistics. 
During the war, on the other hand, there were sudden and radical 
changes in the types of goods produced. 

In the production of war equipment, as is well known, there were 
tremendous gains in productivity as mass production volume was 
achieved. In the airframe industry, for example, productivity more than 
tripled between the time of Pearl Harbor and the end of 1944. This is 
an impressive gain, but its full significance can only be appreciated by 
comparison with the best peacetime experience of manufacturing indus- 
tries. During the “twenties,” even the most progressive industries— 
automobiles, chemicals, and cigarettes—needed ten years to achieve a 
threefold multiplication of productivity. In the ‘thirties,’ the rayon 
industry made the outstanding advance, but, again, the threefold in- 
crease in productivity occurred over a period of ten years. No large 
industry in peacetime has approached a record of a threefold increase 
in productivity in so brief a period as three years. The airframe indus- 
try is fairly typical, and similar developments can be cited for other 
war industries. In shipbuilding, for example, the number of man-hours 
required per Liberty vessel was halved as a large volume of construction 
was achieved. 

These large increases in man-hour output in the war industries are 
easily explained. Primarily, they were achieved because of the adoption 
of mass-production methods, including line systems and special pur- 
pose machines of a type which were used in other industries prior to the 
war. These gains could not have taken place before the war, because 
they depended on huge orders for standardized items. In addition, 
however, there were many entirely new techniques developed to speed 
war production, particularly in welding, light-metal technology, and 
other metal-working processes. 

It is apparent that, to the extent that the manufacture of products 
such as airplanes continues, the wartime experience will contribute to 
higher productivity in manufacturing industries as a whole. While the 
productivity level which will be achieved in the manufacture of civilian 
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aircraft will not be as high as that attained during the war, it will cer- 
tainly be far above the prewar record. However, since the production 
of items similar to the war products will not be of major importance, 
the greatest significance of the new techniques developed in the war 
industries lies in their possible application to civilian types of products. 
Many of the wartime techniques will doubtless be applied in the manu- 
facture of a variety of products. Inprovements made in methods of 
metal working and the rapid progress in the manufacture of communica- 
tions equipment are particularly likely to find application in the peace- 
time economy. A period of years may be necessary before some of these 
new techniques are applied on a large scale, but they should certainly 
make their contribution to increased productivity in a number of manu- 
facturing industries. 

In the nonmunitions industries, a very different development is 
found. But the record is just as creditable if the many difficul ties which 
these industries experienced during the war period are considered. Sta- 
tistics are available for only 24 such industries, mainly industries pro- 
ducing materials and nondurable consumer goods. A combined index 
for the 24 industries continued upward from 1939 to 1941. After 1941, 
there was a slight decline, but the average remained above the 1939 
level. Let me say clearly that this index does not represent, and is not 
intended to represent, all manufacturing or all civilian product manu- 
facturing. It does represent the combined behavior of these 24 indus- 
tries. 

The most important single explanation of these movements is the 
fact that the kind of technical progress which took place year after 
year in peacetime was completely impossible during the war. Any 
ambitious plans to introduce new processes or fundamentally different 
types of machines had to be shelved for the duration. As a matter of 
fact, manufacturers of goods destined largely for civilian use had 
trouble enough obtaining equipment needed for replacement. Thus, the 
normal improvement of plant, equipment, and process was interrupted 
because of wartime restrictions. 

There were numerous other difficulties, too. Experienced workers 
moved out of the civilian industries into the armed forces or war indus- 
tries. Industries producing goods for civilian use had last call on avail- 
able replacements. The wage stabilization program made it difficult for 
these industries to compete with the war industries for labor. In some 
fields, there were actual restrictions on the amounts which could be 
produced. Most companies were bothered by shortages of materials, 
and many of them had to improvise, as best they could, with substi- 
tutes of various kinds. In many cases, there were rapid changes from 
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month to month in the availability of materials, and production fluc- 
tuated accordingly. Where this occurred, it was obviously impossible 
to organize the production process on a stable, efficient basis. It is im- 
portant to note that large declines in productivity occurred in indus- 
tries where production also fell substantially. The cement industry 
is an example. Output per man-hour continued to increase through the 
year 1942, as production expanded. After the bulk of war construction 
was completed, cement production dropped sharply, and with it pro- 
ductivity. By 1944, output was only half as great as it had been in 1942, 
and output per man-hour was 20 per cent lower than in 1942. There 
were other factors as well—long hours, unsatisfactory housing condi- 
tions and inadequate transportation in many industrial centers, the 
nervous tensions induced by anxiety over family members in dangerous 
service. 

Under these circumstances, the maintenance of productivity levels 
was an astonishing performance. A decline was certain unless manage- 
ment and labor did a better and better job with the resources at their 
disposal. The record shows that they did. 

Many of the special wartime operating difficulties are already dis- 
appearing rapidly. Within a few months, materials and labor will gen- 
erally be available in satisfactory supply and there should be few per- 
sistent problems. Some increases in productivity from the wartime 
levels should, therefore, be forthcoming almost immediately. A longer 
period will be needed to compensate fully for the postponement of 
installations of new equipment. It seems fairly certain, however, that 
over the next three or four years there will be unusually large invest- 
ment in new plant and equipment. This will tend to raise productivity 
rapidly even if the new machines are not fundamentally different from 
the best types which were in use before the war. Industrial equipment 
has a long life, and the average age of the machines in use in our indus- 
tries is fairly high. Much of the equipment now in operation has become 
overdue for replacement during the war years. The volume of business 
in prospect for the coming period as well as the financial resources of 
manufacturing concerns will permit extensive installation of new manu- 
facturing machinery. 

There is another group of industries which should be mentioned, 
those which were converted to war production. The reason for conver- 
sion was usually a kinship between the peacetime products of the in- 
dustry and the items needed for war purposes. Precisely for this reason, 
new methods developed during the war and new equipment installed 
should find use especially rapidly in civilian production. The closer the 
kinship, the more directly should experience gained since 1941 prove 
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of particular advantage. After initial readjustment, many of these in- 
dustries should reenter civilian production at productivity levels above 
prewar peaks. 

Perhaps it is in order to add a word of caution concerning any 
statements or statistics on productivity in the reconverted industries 
which may be issued within the next few months. Until full capacity 
operations are reached, a large amouut of labor will be needed without 
any corresponding large output of tinished goods. No comparisons should 
properly be made with prewar performance in these industries until 
normal utilization of productive capacity is attained. 

In general, then, increasing productivity is in prospect for most 
manufacturing industries in the next several years. For manufacturing 
as a whole, average output per man-hour is likely to increase as much 
as one-third before 1950. As has already been suggested, part of the 
expected increase will come from the application to peacetime produc- 
tion of new techniques developed in the war industries. But it should 
be emphasized that an increase in productivity would occur even if no 
wartime developments were applied in postwar industry. Replacement 
of over-age facilities with new equipment of a type already available 
in 1939 wou!d alone account for a considerable rise in productivity 
during the coming vears. Of course, it is expected that most new equip- 
ment will be better. 

The effect of the gradual spread of wartime technological develop- 
ments will be additional. New methods of working with the light 
metals, new plastics, new machines and tools have all been introduced 
during the war, chiefly for application to munitions production. Many 
of these innovations will find a place in peacetime industry and will 
contribute to its efficiency. In some cases, benefits of wartime expe- 
rience will be felt immediately. Other developments will require longer 
to work into the fabric of normal peacetime production. And some, of 
course, will have no peacetime application whatever. 

It is worth noting that a gain such as is now predicted happened 
in manufacturing after the first world war. Output per man-hour, which 
showed little net change between 1914 and 1919, rose for several years 
after 1919 at roughly triple the normal rate. 

The experience of the ‘twenties’ also suggests that general impres- 
sions of what is happening to productivity are not trustworthy unless 
based on a rather comprehensive study of the field. In the early ‘twen- 
ties,’ even when productivity was rising rapidly, a pessimistic attitude 
about productivity was very prevalent. Let me quote a statement by 
a representative of one of our largest manufacturing corporations. 
“The American workman has fallen off 20 to 30 per cent in productive 
















































IATION 


lese in- 
s above 


1g any 
ustries 
pacity 
ithout 
should 
3 until 


most 
turing 
much 
of the 
oduc- 
hould 
if no 
ment 
lable 
ivity 
quip- 


elop- 
light 
uced 
lany 

will 
xpe- 
nger 
2, of 


ned 
hich 


Pa;rs 


res- 
less 
en- 
ide 
by 
ns. 
ive 











RECENT PRODUCTIVITY TRENDS 221 


effort compared with pre-war man-power output.” This was written 
in March, 1920, when we were in the middle of the most rapid produc- 
tivity increase the country has ever experienced. 

The above remarks have concerned average output per man-hour 
in manufacturing as a whole. Rates of change in different industries 
will vary, of course. Some have shown little change in productivity in 
the past and will probably show little in the future. Other active, pro- 
gressive, and youthful industries will make rapid strides in developing 
new methods and equipment. Most of our important basic industries 
will continue to improve their methods and equipment. This process of 
gradual improvement in plant methods is fundamental to the steady 
increase in output per man-hour for manufacturing as a whole. It is a 
process that can be expected to continue in the future. 

Outside the manufacturing industries are found such essential ele- 
ments in our economy as mining, public utilities, and railroad trans- 
portation, to say nothing of agriculture and construction. The story of 
productivity in nonmanufacturing industries, as in manufacturing, 
features a long-run rise in the decades before the war. Effect of the war 
on output per man-hour varied from industry to industry, but after a 
period of post-war readjustment we may expect resumption of the 
gradual rise of efficiency which has characterized the American econ- 
omy for a long time. 

A brief comment on the significance which productivity changes have 
in our economy may be appropriate. To the individual businessman, 
productivity gains are important largely as they affect his own profit 
position. But there are certain broader implications of productivity 
change for the whole econony which have far-reaching consequences. 
First, there is the relationship between output per man-hour, labor 
cost, and price. The output produced per hour of labor, or productivity, 
together with the wages paid per hour of labor, determines labor cost 
per unit of output, which is a large element in production costs. A 
company which initiates a change in methods and achieves greater 
productive efficiency will be in an improved financial position. But 
where there is competition, this favored position may not be long main- 
tained. Other companies will adopt the improved methods, and there 
may be price reductions. As the same time, there may also be pressure 
for wage increases. Eventually, many businessmen will be compelled 
to change to the new methods not to increase their profits, but in order 
to survive. While the initiators of technical improvements may be in 
an extremely favorable profit position for some period of time, the na- 
ture of our economy is such that all successful businessmen must_keep 
abreast of technical advance. 
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In this way, in the past, workers have shared in the benefits of 
technical progress through wage increases and price declines. Unit 
labor cost—which is equivalent to average hourly earnings divided by 
productivity—declined more than 40 per cent in manufacturing from 
1919 to 1939, in the face of increases in hourly earnings which averaged 
more than 25 per cent. Moreover, when long-term trends are consid- 
ered, unit labor cost and prices have moved in close accord. The basis 
for the wage increases and the simultaneous declines in unit labor cost 
and prices was the large rise in productivity. 

It is sometimes assumed that productivity gains are achieved only 
by such large investment in equipment that no basis for wage increases 
is provided. It is true that when methods are changed, the new ma- 
chinery may be more expensive than the machinery it replaces. The 
reduction in labor cost may thus be offset in part by the increase in 
overhead costs. But if changes in methods did not involve some savings 
in costs, management would have no incentive to change to the new 
techniques. As a matter of fact, considering all manufacturing indus- 
tries together, there is no evidence to suggest that the amount of capital 
required per unit of output has shown any upward trend over the last 
20 years. 

The economy has largely passed through the period of substituting 
mechanized for hand methods of operation. During this earlier period, 
it is obvious that capital requirements per unit of production increased. 
Today, improved machines are in many cases being substituted for 
older machines, and the methods employed in making the machines 
themselves are being improved. Many instances can be found where the 
price of capital equipment has not increased in proportion to its greater 
capacity and durability. If it is true, as has been suggested, that capital 
requirements per unit of production achieved are not increasing, or 
perhaps are declining, this is a fact of the utmost importance in connec- 
tion with the fundamental structure of the economy. 

It may also be observed that much of the increase in productivity 
in the United States has come about through large scale production 
which in turn has permitted the use of specialized equipment. Large 
scale production necessarily implies specialization throughout the pro- 
ductive process. This has resulted in a tremendous degree of interde- 
pendence of our different citizens one upon the other. In a primarily 
agricultural society, an economic change in one locality may have little 
effect on conditions in any other. In our specialized society of today, 
an economic change in one locality may have repercussions throughout 
the world. The important thing to note is that this trend has not run 
its course, and we may expect that intensification of our interdepend- 
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ence in the future will have very fundamental and very far-reaching 
social and economic consequences. 

The relationship between productivity change and employment has 
received much attention. The fact that the long-run trend of productiv- 
ity is always upward means that the volume of production must con- 
stantly increase if large-scale unemployment is to be avoided. If the 
full benefits of technical progress are to be achieved, the human re- 
sources of the economy must not be dissipated through unemployment. 
It also is clear that it would be no solution at all to attempt to retard 
productivity advance, because the nation’s great productive efficiency 
is the main source of our high standard of living. 

Ordinarily, we think of living standards in terms of consumption 
of goods and services. We know that the average American has better 
food than the average citizen in other countries, that he has a better 
place to live, that he has more and better clothing, that he has some of 
the luxuries—an automobile, a radio, a refrigerator—and we generally 
recognize that these advantages depend on our technical superiority. 
But we may forget some of the benefits of productivity advance which 
are less apparent, but which also enter into our standard of living. The 
reduction of working hours and the increase in leisure time is one of the 
important byproducts of high productive efficiency. Similarly, the 
ability to maintain a large educational system and to do without the 
labor of children and young people depends on the high output obtain- 
able per man-hour of work. The possibility of retirement in old age is 
still another example of the “hidden benefits” of productivity advance. 

It is also worth noting that productivity is important in international 
trade. Wages and living standards of American workers can be far 
higher than wages of workers in other countries without any general 
need for high tariff barriers precisely because productivity levels are 
far higher in the United States than elsewhere. But despite the fact 
that the productive efficiency and living standards of the United States 
far surpass those of any other large nation, potentialities for further 
technical progress are by no means exhausted. 

We may, I believe, expect continuing advances in productivity, and 
it is likely that those advances will be especially rapid during the com- 
ing three or four years. The challenge is a serious one. Greater produc- 
tive efficiency and scanty production levels may bring unemployment 
and distress. Greater productive efficiency and high employment levels 
together promise standards of living for all groups far above the best 
we have known in the past. 








THE BEST UNBIASED ESTIMATE OF POPULATION 
STANDARD DEVIATION BASED ON GROUP 
RANGES 


Frank E. Grupss AND CHALMERS L. WEAVER* 
Ballistic Research Laboratories, Aberdeen Proving Ground, Maryland 


In order to better the efficiency of the statistic range for 
sample sizes greater than eleven, a method, which possesses 
practical efficiency, is given in this paper for dividing a sample 
into groups so that an estimate of the standard deviation of 
the normal population sampled can be formed from a linear 
function of the group ranges. The estimate so obtained is 
“best” in the sense that it is unbiased and has a smaller vari- 
ance than any other estimate which is based on a linear com- 
pound of group ranges. In dividing a sample into groups for 
using the range of groups, it is shown in the Appendix that the 
most efficient group size is eight. Table I of this paper gives 
the approximate percentage points for the best unbiased esti- 
mate based on group ranges and Table IJ gives approximate 
percentage points for the estimate based on mean range deter- 
mined from r groups or samples of equal size n. 


iscussion of methods and use of the tables. The use of the control 
dD chart in analyzing data has spread quite rapidly due in a large 
part to the simple statistic range, or maximum dispersion, which 
can be used for estimating the standard deviation of the lot or popu- 
lation which we are sampling, especially for small sample sizes. That 
is, a measure of the true variation in a lot could be estimated by simply 
taking the difference between the largest and the smallest observation 
in a sample of n items and dividing the range or maximum dispersion 
so obtained by a factor which depends on sample size. Although the 
sample standard deviation (root-mean-square of the deviations from 
the average) is a more efficient estimate of dispersion, it may be desir- 
able from a practical standpoint to use the range in view of its sim- 
plicity and since only a slight loss in efficiency is suffered in the case 
of small samples. Moreover, the computation of the standard deviation 
is considered time-consuming by many who are engaged in the appli- 
cations of statistics. It has been recognized generally that the range or 
maximum dispersion should not be used when the sample size is greater 
than about 15 or 20. The reason for this is due to the practical loss in 
efficiency of the range as compared to the standard deviation for sample 
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sizes greater than about 10. In this connection, the efficiency of the 
standard deviation for a sample of size 10 is about the same as that of 
the range for a sample of size 12. As a further example, the efficiency 
of the standard deviation for a sample of size 16 in estimating popula- 
tion dispersion is about the same as that of the range for a sample of 
size 24. The method of comparing the standard deviation and the range 
is given in the Appendix and the above conclusions were drawn from 
Table A of the Appendix. Thus, it becomes apparent that if the range 
is to be used for sample sizes greater than, say, 15, some consideration 
should be given to making up practically for its loss in efficiency. 

The question naturally arises as to whether one should actually 
consume the additional time necessary in order to compute the sample 
standard deviation or use some method of estimating dispersion based 
on ranges of groups which for all practical purposes would be nearly as 
efficient as the standard deviation and yet be swift and simple in 
character. The latter technique would certainly be preferable in cases, 
for example, where shop personnel are required to construct control 
charts and a considerable training period would have to be undertaken 
for gaining confidence in the use of the standard deviation. Some have 
advocated the use of what is now called the “mean range”! whereby 
a sample is divided into groups (usually in the order in which the 
observations were taken), the range of each group determined and the 
average range of all the groups in the sample used to estimate the 
standard deviation of the lot sampled. Thus, suppose that we have a 
sample of 24 items, then we could divide the sample into two groups of 
12 items each, 12 groups of two items each, three groups of eight, eight 
groups of three, four groups of six or six groups of four. In the division 
of the sample of 24 items into four groups of six, for example, we find 
the range of each group of six observations and divide the sum of the 
four ranges thus obtained by four in order to get the mean or average 
range. This mean or average range for group size six would then be di- 
vided by the appropriate factor, 2.534 (see p. 39 of American War 
Standard “Control Chart Method of Controlling Quality During Pro- 
duction” Z1.3-1942), to obtain an estimate of the standard deviation 
of the normal population sampled. 

The question now arises as to just what sub-division of a sample 
should be made. For example, which of the sub-divisions in the pre- 
ceding paragraph for the sample of 24 items should be used? This ques- 


1 For an analogue of this proposal, the reader is referred to a paper by Davies and Pearson, “Meth- 
ods Estimating from Samples the Population Standard Deviation,” Supplement to the Journal of the 
Royal Statistical Society, Vol. 1, No. 1. 
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tion is answered in the Appendix by using what is called the “Best 
Unbiased Estimate” of the population standard deviation based on 
the group ranges. Essentially, we determine that division of the sam- 
ple which will give the best precision in estimating the standard devia- 
tion of the population sampled or, what is the same thing, by finding 
which mode of sub-division will give the smallest standard deviation 
of the estimate of population disnersion. For the sample of 24 items 
referred to above, the results are tabulated below. 


S.D. of Estimated 


Mode of Sub-Division Standard Deviation 
1 Group of 24 .183 
2 Groups 12 .169 
3 ” 8 .166 
4 “s 6 .167 
6 . 4 175 
8 ° 3 .186 
12 ad 2 .218 


It is noted in the above Table that dividing the sample into three 
groups of eight items each gives a smaller standard deviation (.166) 
than any other method of sub-division into groups of equal size. 
Hence, for a sample of 24 items the best method of making the division 
would be to use three groups of eight (no division of the sample into 
groups of unequal size will give a smaller standard deviation for the 
total sample size of 24). A further examination of the Table shows that 
a division into four groups of six for the sample of size 24 is better than 
a division into six groups of four. Moreover, a division into two groups 
of 12 is somewhat better than a division into six groups of four. As a 
general statement the best unbiased estimate of the population standard 
deviation using range is determined by dividing the sample into groups 
of size about eight (see the Appendix). 

n case the sample size is odd, say for example 13, then the number 
of items in each group will not be equal. However, as shown in the Ap- 
pendix, we find that the best method for dividing a sample of size 13 
is to use one group of size 6 and another group of size 7. 

In Table I the recommended method for dividing the total sample 
size is given for sample sizes ranging from 2 to 100. For the best un- 
biased estimate based on group ranges, the sample will always be 
broken up into some of the group sizes 6, 7, 8, 9, or 10, with groups of 
size 7 and 8 predominating (see column two,’ Table I). Column three 
of Table I gives the coefficients by which to multiply the sum of ranges 


? Column two, and subsequent columns referred to, indicate columns headed by circled numbers, 


i.e., @, ® ete. 
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page 416 of E. 8. Pearson “The Percentage Limits for the Distribution of Range in Samples From a Normal Population,” Biometri 
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of equal group sizes in order to obtain the best unbiased estimate of the 
population standard deviation, ¢. As an example, suppose that we have 
the following sample of 30 items: 6.5, 7.1, 6.9, 7.1, 6.3, 6.7, 7.8, 7.1, 
7.1, 7.3, 7.3, 6.5, 6.9, 6.7, 7.5, 7.3, 7.1, 7.8, 7.3, 7.3, 6.7, 7.1, 7.3, 7.1, 
7.1, 6.9, 6.5, 7.1, 7.3, 7.5. Then using Table I we should divide the 
above sample of 30 items (in the order in which the observations oc- 
curred, for instance) into 2 groups of 7 items each and 2 groups of 8 
items)—see column two, Table I for a total sample size of 30. The re- 


sulting groups are: 


6.5 7.1 7.5 7.3 
7.1 7.1 7.3 7.1 
6.9 7.3 7.1 7.1 
oan 7.3 7.8 6.9 
6.3 6.5 7.3 6.5 
6.7 6.9 7.3 Sok 
7.8 6.7 6.7 7.3 

7.1 7.5 


The two ranges for group size 7 are 1.5 and .8, whereas the two 
ranges for group size 8 are 1.1 and 1.0. The sum of the two ranges of 
group size 7 (2.3) times .08619) (see column three of Table I for co- 
efficients for various group sizes) is added to the sum of the ranges of 
group size 8 (2.1) times the factor .09374. The result gives an esti- 
mated population standard deviation of .395. In case we are confronted 
with a sample of say 59 items we divide the sample into five groups of 
seven items each and three groups of eight items each (see column two, 
Table 1). We then add the ranges obtained from groups of size seven 
and multiply this sum by .04384. To this result we add the product of 
.04768 and the sum of three ranges obtained from groups of size eight, 
thus finding the best unbiased estimate of the standard deviation of 
the normal population sampled when a linear function of group ranges 
is used. Column four, Table I, gives the standard deviation or precision 
of the estimate found by use of columns two and three. Column five 
gives the standard deviation or precision of estimate based on the 
single range of the whole sample. In this connection, it may be noted 
from columns four and five that an estimate obtained from a sample 
of size 29 using the recommended method of dividing the sample into 
groups in accordance with columns two and three, Table I, is just as 
precise as that of using the single range of a total sample size of 44 
items. This gives an indication of the efficiency which will be gained 
by dividing the sample into groups rather than using the range of the 
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entire sample. Column six, Table I, gives the standard deviation or 
precision of the root-mean-square estimate,’ 





> (2, — 2)* 
8 1 i=l 


— a as b 


Cn Cn 7 





which is the most efficient sample statistic for estimating population 
standard deviation. Examination of columns four and six will show 
that a sample of size 17 using the sample standard deviation has a 
precision of .1781 whereas a sample of size 21 using the best unbiased 
estimate based on group ranges has a precision of .1779. By similar 
comparisons using columns four, five and six we get an idea of the effi- 
ciency of the proposed method of using ranges of groups as compared 
to using the maximum dispersion for the total sample size or the 
sample standard deviation. 

Column seven, Table I, headed “Approximate Percentage Points,” 
gives appropriate factors by which to multiply an accurate estimatet of 
the population standard deviation, ¢, in order to obtain the indicated 
probability or control limits for a sample estimate of population 
standard deviation based on ranges of groups. The method for obtain- 
ing these approximate percentage points is described in the Appendix. 
Column eight of Table I gives the measure of skewness, az, of the 
estimated population standard deviation. 

It should be noted that in control chart analyses, it will be convenient 
in using Table I to plot the estimated o of the sample (especially when 
the sample is divided into groups of unequal size) as the measure of 
dispersion rather than some “mean” range of the groups in the sample. 

Table II gives approximate percentage points for an estimated o 
based on r groups of equal size, should it be convenient or practicable 
to so divide samples. In this case, the estimated ¢ is obtained simply by 
dividing the average range for a fixed group size by the factor d, (see 
the Appendix). Column two of Table II gives the standard deviation 
of the estimate of o based on r groups of size n. Column three of Table 
II gives the factors to multiply a “long-term” estimate of o by in order 
to obtain approximate control limits or percentage points for the esti- 
mated o of asample of size rn. Column two of Table II may be compared 
with columns four, five, and six of Table I to determine the relative 
efficiency of the mean range (or estimated c) of r groups of equal size n 


3 See appendix for definition and values of cn. 
‘T.e., a “long term” value based on past data. 
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TABLE II 


Rn 


FACTORS FOR — 


dn 








Number 


of 


r 





- © be 


Group | 


mize 


Naan 


Go 


n=6 


Group 
Size 8 


n=7 | 9 


Group 7 
Size 8 
n=8 9 
} 10 

11 

12 

13 

14 

15 


| 
Groups 


> 





























*, = | Approximate Percentage Points 

Estimate | | | | a 
(Rn/dn) | .995 | .990 | .950 | .050 -010 .005 

| 
3346 | .30 | -34 | .49 | 1.59 | 1.88 | 1.98 4344 
. 2366 | 47 51 .63 1.41 1.60 1.67 | .3072 
.1931 | .54 .58 .69 1.33 | 1.48 1.53 . 2508 
-1673 | .60 | .63 | .73 | 1.28 | 1.41 | 1.46 | .2172 
. 1496 .64 .67 .76 1.25 | 1.37 | 1.40 | 1943 
. 1366 .67 .69 -78 1.23 | 1.33 1.37 -1773 
- 1265 -69 | .72 -80 1.21 | 1.31 | 1.34 | 1642 
. 1183 | 71 | .7 .81 1.20 | 1.29 | 1.32 . 1536 
"115 | .73 | .78 | .82 | 1.19 | 4.2 1.30 .1448 
1058 | .74 | .76 .83 1.18 1.25 1.28 .1374 
1009 | .7 | 77 | «84 1.17 1.24 1.27 .1310 
.0966 | .76 .78 | .84 1.16 1.23 | 1.26 . 1254 
.0928 77 .79 85 1.16 1.22 | 1.25 . 1205 
.0894 7 | . 80 .86 1.15 | 1.22 | 1.24 .1161 
. 0864 .79 | .80 . 86 1.14 1.21 | 1.2: .1122 
| | 

3081 | .34 | .39 | .53 | 1.54 | 1.80 | 1.90 .4162 
2179 | .50 | .54 | .66 | 1.38 | 1.55 | 1.62 | .2943 
.1779 | .88 | .61 | .72 | 1.30 | 1.43 1.49 | .2403 
1541 | .62 | .66 | .75 | 1.26 | 1.38 | 1.42 | .2081 
1378 | .66 | .70 | ‘98 1.23 1.33 1.37 . 1861 
.1258 | .69 | .72 | .80 | 1.21 | 1.31 1.34 . 1699 
1164 | .71 | .74 81 1.19 1.28 1.31 | .1573 
.1089 | .73 | 75 | .82 | 1.18 | 1.26 | 1.29 | 1471 
. 1027 75 | 97 84 1.17 1.25 | 1.28 | .1387 
.0974 | 76 | .78 | 84 | 1.16 | 1.24 | 1.26 | .1316 
.0929 | .77 | .79 | .85 | 1.15 | 1.22 | 1.25 . 1255 
0889 | .78 | 80 | .85 | 1.15 | 1.21 | 1.24 .1201 
.0855 | .79 .81 -86 | 1.14 1.21 | 1.2 .1154 
0824 | 80 | .81 87 | 1.14 1.20 1.22 1112 
.0795 | .80 .82 .87 | 1.13 1.19 | 1.21 | .1075 
.2879 | .38 | 42 | .56 | 1.51 1.7 | 1.85 | .4062 
. 2036 .53 | .57 .68 | 1.35 1.51 1.58 | .2872 
.1662 | .60 | .64 | .73 | 1.28 | 1.40 | 1.45 | .2345 
.1440 | .65 | .68 | .77 | 1.24 1.35 1.39 |  .2081 
.1288 | .68 x -79 | 1.22 | 1.31 | 1.35 | .1817 
.1176 . oe .74 821 | 1.90 | 1.39 1.32 | . 1658 
.1088 | .73 -76 -83 | 1.18 | 1.26 | 1.29 - 1535 
.1018 | .75 ae . 84 4. a7 | 1.24 1.27 . 1436 
0960 7 .78 85 1.16 1.23 | 1.2 . 1354 
.0910 | .77 79 85 1.15 | 1.22 | 1.24 | .1285 
0868 | .7 .80 | .86 | 1.14 | 1.21 | 1.2% . 1225 
.0831 | .79 81 . 86 | 1.14 1.20 | 1.22 .1173 
.0799 .80 82 .87 1.13 1.19 | 1.21 .1127 
.077 81 . 83 .87 | 1.13 1.18 | 1.20 . 1086 
.0744 81 .83 .88 | 1.12 1.18 | 1.20 . 1049 
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Estimate 


(Rn/dn) | .995 | 















Group 
Size 
n=9 


Group 
Size 
n=10 
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- 2720 
- 1924 
- 1570 
- 1360 
- 1216 
-1110 
- 1028 
- 0962 
- 0907 
. 0860 
- 0820 
- 0786 
- 0755 
-0727 
- 0702 


- 2590 
- 1832 
- 1496 
- 1295 
-1158 
- 1058 
.0979 
0916 
- 0863 
-0819 
-0781 
.0748 
.0718 
- 0692 
- 0669 





| 
| 
| 


Ra 
FACTORS FOR — 
dy, 
® 
Approximate Percentage Points ® 
os 
.990 | .950 050 010 | 005 
.45 .59 | 1.48 1.71 | 1.80 .3999 
.60 .70 | 1.33 1.48 1.55 . 2828 
. 66 .75 1.27 1.38 | 1.43 . 2309 
.70 .78 1.23 1.33 1.37 . 2000 
78 ..80 1.21 1.30 | 1.33 . 1788 
75 . 82 1.19 1.27 | 1.30 . 1633 
TT . 84 1.17 1.25 | 1.28 1511 
.78 85 1.16 1.23 | 1.26 1414 
79 | .85 | 1.15 | 1.22 | 1.24 | .1333 
80 . 86 1.14 1.21 | 1.23 | .1265 
.81 . 87 1.14 1.20 | 1.22 | .1206 
82 | .87 | 1.13 | 1.19 | 1.21 | .1184 
.83 . 88 1.12 1.18 1.20 . 1109 
84 .88 1.12 1.17 | 1.19 | .1069 
84 | .80 | 1.12 | 1.17 | 1.19 |  .1033 
.48 .60 1.45 1.68 | 1.76 | .3974 
61 71 1.32 1.46 1.52 | .2810 
.67 .76 | 1.25 | 1.36 | 1.41 | .2294 
72 .79 1.22 1.32 | 1.35 |  .1987 
74 .82 1.20 1.28 | 1.31 | .1777 
.76 83 1.18 1.25 1.28 . 1622 
.78 . 84 1.16 1.23 1.26 . 1502 
79 85 1.15 1.22 | 1.24 | .1405 
81 . 86 1.14 1.21 1.23 |  .1325 
82 .87 1.14 1.20 1.22 | .1257 
.83 .87 1.13 1.19 1.21 .1198 
.83 .88 1.12 1.18 1.26 1147 
.84 .88 1.12 1.17 1.19 | .1102 
. 84 . 89 1.11 1.17 1.18 | .1062 
85 .89 1.11 1.16 1.18 | .1026 
















































with that of (1) the best unbiased estimate based on group ranges, 
(2) the range found from the total sample size, or (3) the sample stand- 
ard deviation. 

In using the Tables of this paper, it should be borne in mind that 
mean values, standard deviations and percentage points are factors 
which should be multiplied by the population standard deviation, co. 

Finally, it is remarked that the results of this paper are based on 
random samples from a single normal universe. In case the data dis- 
play trends, then appropriate methods for estimating o should be 
taken into consideration. A reason for dividing a sample into groups in 
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the order in which the observations were taken is, of course, to reduce 
the effect of trends should they exist in the data. A general treatment 
of the appropriate method of sub-dividing a sample is dependent on 
particular applications and beyond the scope of the present paper. The 
authors, however, have found it advantageous in many cases to group 
observations in the order taken. 


APPENDIX 


Precision of Sample Statistics. A sampie statistic is said to be an un- 
biased estimate when its expected or mean value is equal to the ap- 
propriate population parameter. For example, the mean or expected 
value of the range in samples of size 5 from a normal parent is given by 


E(Rs) = 2.3260 


where o is the standard deviation of the normal universe sampled. Thus, 
the statistic R; when used as an estimate of o is biased. However, the 
statistic R;/2.326 is an unbiased estimate of the standard deviation c. 
In general, if 

E(R,) = dao (n = sample size) 
we may obtain an unbiased estimate of the population standard devia- 
tion o by using the statistic 
R, 
dn 





In speaking of the sample standard deviation 


oag/ BE—* 
n 


we have for a normal population the following relation 


r( 2) 
2 
= 4/— (n = sample size). 
r(*5 -] 
/ 


For brevity, it will be convenient to put 

n 

> Ts) 
/ 2 2 
Cn = —- 

n n—1 

r 
2 
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Thus s/c, gives an unbiased estimate of o. 

One sample statistic A is said to be more precise than another sample 
statistic B when the standard deviation of A (corrected for bias) is 
less than the standard deviation of B (corrected for bias). Thus, in com- 
paring the precision of the range and standard deviation, we examine 
the relation between the variances of the two estimates, i.e. 


R, R, i] . § s§ : 
BY -- B( ) and E ‘- - B(=)t. 
dn d, f Ca Cn 


Since the moments of the range are known,’ we shall define 


E(R, — dao)? = kp2o? 








so that 





R, 2 k,,? ; 
E<—-o becomes me, 


n 


Also, since 





n 


then 





8 2 n—1)— ne,? 
B= — oh ™ o*, 


Ce NCy* 
In Table A, on page 237, we have tabulated 
(n — 1) — ne,” kn 


and — 
NCn? an 





as the standard deviations or precisions of the sample statistics 
s 1 /S@-H Ry 
— and , 
Cn Cn n d 


Best Unbiased Estimate Based on Linear Function of Ranges. Con- 
sider a sample of N items which is broken down into m groups of size 











respectively. 


5 See Reference [1] at end of paper. 
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TABLE A 
Sample | an of 2. s.D. of 

Size Cn dn 
2 7555 | 7555 

3 . 5227 . 5248 
4 4220 .427 

5 . 3630 .3714 

6 . 3233 . 3346 

7 2941 3081 

8 . 2716 . 2877 

9 . 2365 . 2720 
10 . 2388 . 2590 
11 . 2262 . 2481 
12 2155 . 2389 
13 . 2062 . 2308 
14 .1979 . 2237 
15 | . 1906 2175 
16 . 1840 2121 
17 .1781 .2071 
18 .1727 . 2027 
19 . 1678 . 1987 
20 - 1632 - 1952 
21 1591 1916 
22 . 1552 . 1885 
23 1516 . 1856 
24 . 1482 . 1828 
25 .1451 . 1804 

My, Ne, Nz, - * « , Mm Such that 


N = Don, 


i=1 
We will designate the ranges of the m groups as 
| | | = o.oo | 


The best unbiased estimate of the population standard deviation, ¢, 
is defined as that estimate which (1) is unbiased and (2) possesses mini- 
mum variance. 

Suppose we form a linear function of the group ranges: 


Est. ¢ = aR, + a2k,, + +++ + Qnlen,,. 


In order for >> a;R,; to be the best unbiased estimate, we must 
t=1 


determine the coefficients a; such that 
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(1) BE > aka, =o 


t=1 


m 2 
and (2) BE} > aiRn, — ot shall be a minimum, 


t=] 


We may immediately write that the appropriate coefficients are given 
by 
d; 


¢.=—— 
> 


i=] k?,,, 





since it is known® that the linear combination of random variables 
which gives the best unbiased estimate has coefficients or “weights” 
which are inversely proportional to the variances of the variables. 

Moments of Est c. From the basic principles of linear compounds of 
variables, we have: 


Variance of Est. ¢ = >> a; - variance of R,, = a - 


t=1 = dn, . 
L(z) 


For the measure of skewness, 











a dn, 3 
> ( k ) a3(Pn,) 





as; of Est. ¢ = 





t=—1 Ras 
Values of the standard deviation of Est ¢ and a; of Est « were com- 
puted from the above formulas. 


The divisions of samples into the group sizes recommended in Table 
I were obtained by finding that particular division which makes 


EG) 


a maximum for a given sample size. 





* See, for example, 8S. 8. Wilks, Mathematical Statistics, Sect. 4.42, Princeton University Press, 1946. 
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The values of a; were used to fit a Pearson Type III frequency curve 
to the moments and thus to approximate the percentage points when 
the number of groups was greater than three. 

The statistic 

x =e 
ey 
a/2v 
where x? is distributed as x? with v degrees of freedom, has zero mean, 
unit variance, and 4/8/» for the value of a3. up, the value of u such that 
Prob (u>up) =P, was computed as 


2 
xp — 8 
ap @ oe where p = a= « 
a/2v a? 


xp? was found in the published tables of percentage points of the x?- 
distribution’ by interpolation or for large v by assuming that ./2,? is 
normally distributed around mean 4/2v—1 with unit variance. 

After obtaining the percentage points up for the reduced variable 
the percentage points for Est « were computed by 


{Est. o} p = Mean {Est. o} + up-S.D. of {Est. c}. 


The above method was modified for the cases involving one, two, or 
three ranges. 

For one range the percentage points of the estimate of ¢ were made 
by dividing the percentage points of the range (this distribution has 
been tabulated for sample sizes 2 to 20) by d,. 

For two or three ranges the method used involved a, as well as az 
which ultimately obtained the percentage points by a transformation 
from Pearson Type I frequency distribution or Beta distribution. 

a, or the measure of peakedness of the distribution of the “mean” 
range was determined from the formula: 


>( = [au(Rs,) — 3] 


“(E@7 


It is believed that the figures quoted for the approximate percentage 








ag = 





7 See Reference [2). 
8 See Reference [3]. 
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points in Tables I, II, and III are accurate within one unit in the last 
place given. 

Table B, at the end of the Appendix, gives appropriate moment 
constants and values of 


d,, di, d,\? d,\4 
2 ’ 7” (=) a3(F,) and mY | a4(Rn) - 3} 
for the ranges of sample sizes from 2 to 12. 

It should be noted that from the formula 
1 


= / é.,\° 
me) 


for the variance of the best unbiased estimate of o from a given set of 
ranges of groups of sizes 71, M2, - - - %m it can be proved that for a very 
large sample the best estimate of o by use of ranges may be made by a 
random breakdown of the total sample into groups of size 8 each. If 
the total sample size is N and it is divided into N/n samples of n each, 
the variance of the estimated o becomes 


1 
N ( =) 
n \kn 


. . . . 1 d, ' . . 
and this is a minimum when —| — } is a maximum. From the follow- 
a \h 








o? 


n 


1 (=) 
n —|{— 

n\k, 
.489 
.505 
.508 


. 502 
. 490 


ing values 














co ON OD 
a a a 


10 


it is seen that the best group size is 8. However, group sizes of either 
7 or 9 are almost as good as 8. For total sample size N, where N is 32 
or larger, the rule for breaking the total sample into subgroups becomes: 
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Divide N by 8 and let the remainder be called S. Then for 
S=0 all subgroups are size 8 
S=1 one group size 9, all others size 8 
S==2 two groups size 9, all others size 8 
S==3 five groups size 7, all others size 8 
S=4 four groups size 7, all others size 8 
S=5 three groups size 7, all others size 8 
S=6 two groups size 7, all other size 8 
S=7 one group size 7, all others size 8 





















































TABLE B 
MOMENT CONSTANTS OF THE RANGE 
— Mean | Std. Dev. dy, d, \? dy, \# dn ‘ 

§ . a —_ — —_ - 
| ee) (kn) . * Pe kn as MNase 
2 1.1284 - 85250 - 99527 3.8692 1.5527 1.7520 2.3080 2.668 

3 1.6926 - 88821 . 64541 3.2790 2.1455 3.6314 4.4663 3.679 

4 2.0587 . 87982 - 52257 3.1848 2.6595 5.4752 6.6949 5.540 

5 2.3259 . 86394 . 46443 3.1608 3.1162 7.2479 9.0623 8.447 

6 | 2.5344 . 84793 - 43443 3.1627 3.5250 8.9337 | 11.600 12.99 

7 2.7043 . 83316 - 41621 3. 1686 3.8958 | 10.535 14.232 18.71 

| 

8 2.8472 . 81975 . 40619 3.1767 4.2370 | 12.063 17.018 25.71 

Q 2.9700 | .80789 .39986 | 3.1840 4.5504 | 13.515 19. 867 33.61 

10 3.0775 79718 . 39737 3.1964 4.8427 | 14.903 22.862 43.62 

1 3.1729 . 78729 . 39493 3.1994 5.1182 | 16.240 25. 846 52.59 

2 | 3.2584 | . 77846 . 39353 3.2040 5.3769 | 17.520 28.859 62.62 
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INTEREST RATES BY LOAN SIZE AND 
GEOGRAPHICAL REGION 


Joun M. HartweEtt, Jr. 
Mutual Life Insurance Co. of N. Y. 


N CONNECTION with its study of Policy Loan rates which culminated 
| in the recently annourced reduction of interest charges, the Mutual 
Life Insurance Co. of N. Y. conducted a survey among its correspond- 
ent banks. The survey solicited information on the rates charged by the 
banks for personal loans, ranging from $100 to $20,000. This survey 
was conducted during the first half of July 1946. 

The following summary is a compilation of the rate questionnaires 
returned by all banks. Slightly over 100 requests were sent to banks 
located in all 48 states. To date answers have been received from 93 
banks in 42 states. In most cases, the banks were in different cities; 
the exceptions were a few leading banking centers—New York, Phila- 
delphia, Atlanta, Chicago, St. Louis, amd San Francisco—where re- 
quests were addressed to several different banks. 

The questionnaire requested information on typical rates by size 
of loan for both installment collateral loans and for demand or time 
collateral loans. Only a minority of the banks found it possible to answer 
the section on installment loans, in part perhaps because of uncer- 
tainties introduced by questions of maturity, service charges, and the 
specialized manner in which that type of transaction may be handled. 
A careful analysis of the installment loan questionnaires which were 
returned revealed cases in which there was doubt about assumptions 
made on the problems noted above. Furthermore, there was sufficient 
variation to prohibit a simple tabulation. Consequently all installment 
rates were eliminated from the final compilation. 

The demand and time loan section was answered by a majority of 
the banks and presented few problems of interpretation. The question- 
naire provided columns for indicating rates on either a simple interest 
or a discount basis. In most cases the answers were in simple interest 
form. In the few cases where the discount basis was used, and where it 
obviously applied to a whole year, the rate was converted to a simple 
interest basis. In order to simplify the final tabulation these converted 
rates were inserted in the nearest half per cent column in the table. 

The banks were aware that Mutual Life’s questionnaire arose in 
connection with policy loan rates. As a result several banks noted the 
specific rates they charged on loans with life insurance policies assigned 
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as collateral. In a very few cases this was the only information provided, 
and those cases were included in the tabulation. 

In addition, some banks indicated a range of rates for each size. 
Where the range was small and consistent, both figures were used. 
Where the range was large and erratic, or included apparently unique 
cases, the return was usually eliminated. It is well known that the rate 
charged by a bank is not completely uniform for loans of the same size. 


TABLE 2 
WEIGHTED AVERAGE RATES OF INTEREST 


North & East 





Size of Loan New York City South & West 





$ 100 5% 5.8% 5.9% 
500 5 5.4 5.6 
700 4 5.4 5.3 

1,000 3.5 4.8 5.2 
1,200 3.5 4.4 5.0 
1,400 3.5 4.4 5.0 
1,600 3.5 4.4 4.8 
1,800 3.5 4.3 4.7 
2,000 3.5 4.3 4.5 
2,500 3.5 4.2 4.2 
3,000 3.3 4.1 4.1 
3,500 3.3 4.1 4.2 
5,000 3.2 3.9 3.9 

10,000 2.8 3.6 3.4 

12,000 2.8 3.5 3.3 

15,000 2.8 3.5 3.3 

20,000 2.7 3.4 3.3 





Fluctuations up or down will depend on the relationship between the 
bank and the borrower. These fluctuations were of interest in the gen- 
eral study, but extreme fluctuations were not adaptable to tabulation. 
Consequently the rates given below do not represent the extreme cases. 

A few banks in the larger cities reported they did not have sufficient 
business of the type under consideration to warrant filling in the form. 
Their loans were predominantly larger than the size classifications 
requested. 

After analyzing the returns to make the selection described above, 
there were 63 out of the total 93 available for tabulation. These were 
then divided into three geographic areas: New York City, other North- 
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ern and Eastern cities, and Southern and Western cities. The break- 
down was based on that used by the Federal Reserve System in its series 
on Rates on Customer’s Loans. Many of the cities, of course, were not 
included in the list used by the Federal Reserve System, but the two 
areas were delineated by that list. 

Table 1 shows the number of banks indicating each specified rate for 
each size of loan. The number for each combination cannot exceed the 
total number of banks recorded in that area. But the total number 
for a given size can exceed the number of banks because of the multiple 
answers noted above. 

The table suggests several conclusions about the pattern of interest 
rates. One interesting observation is the wide range for any particular 
size. This dispersion is even more remarkable when it is remembered 
that random cases have been eliminated so far as possible. Further- 
more, the questionnaire applied only to secured loans. Here is abun- 
dant evidence that that economist’s abstraction “the interest rate” is a 
phantom figure when it comes to the practical business of making loans, 
even when some allowance is made for differences in type of collateral. 

Another pertinent suggestion from these returns is the small differ- 
ence between rates in the North and East and those in the South and 
West. Table 2 gives weighted averages for the three geographic areas. 
For loans running from $1,000 to $2,000 these returns show rates in 
the South and West to be lower than in the North and East. It is only 
above $5,000 that the banks in the North and East offer an advantage. 

The downward trend of rates as the size of loan increases is very 
clear. There are several reasons for believing it may be more marked 
than the tabulation indicates. Some banks noted along with their rate 
schedule that small loans, under about $1,000, were usually handled in 
a small loan department, where presumably the rates are higher. Also, 
some banks did not quote rates for the small sizes, for the same reasons 
probably. And some banks noted they did not handle small loans as a 
rule, preferring to let the finance companies handle that business. 

There is no doubt that the tabulation does not include all the rates 
that banks may and do charge. But it does offer a working sample of 
the pattern within which rates fall over the country as a whole. 








“COST-UTILITY” AS A MEASURE OF THE 
EFFICIENCY OF A TEST 


Josera Berkson, M.D., D.Sc. 
Division of Biometry and Medical Statistics, Mayo Clinic, 
Rochester, Minnesota* 


Frequently a test result or measure z applying to an individ- 
ual is used to “predict” not a quantity y of the individual, but 
instead, whether the individual belongs to A or B of two mu- 
tually exclusive categories. The question discussed is how to 
measure the effectiveness of the test z in achieving the di- 
chotomization. Measures such as the biserial or tetrachoric r 
are considered open to objection on conceptual and opera- 
tional grounds. A method is advanced that consists in deter- 
mining (1) the utility, defined as the fraction of individuals 
actually belonging to category A which the test correctly 
designates as A and (2) the cost defined as the fraction of 
individuals actually belonging to category B which the test 
incorrectly designates as A. Two tests are to be compared on 
the basis of their comparative costs for the same utility. When 
a test can be used at any utility, an over-all measure of effec- 
tiveness is the mean-cost (M.C.) for all utilities. Derived from 
this is an index, the mean-cost-rating (M.C.R.=1—2M.C.) 
which like the correlation coefficient extends from zero to 
unity. The equation giving cost for utility is written for the 
case where zx is normally distributed. The M.C. and M.C.R. 
are tabled for the case where the standard deviation of z is 
equal for A and B. Cost-utility relationship for such tests is 
depicted on a prob.-prob. scale. Comparison of normally dis- 
tributed tests in terms of such representation is discussed. 


is confronted is to measure the effectiveness of a selective test, as 
exemplified in the following situation: In the United States Army Air 
Forces, the candidate for pilot training was subjected to a battery of 
different psychomotor aptitude tests, the results of these were com- 
bined to give a single score and the individual who achieved a score 
above a predetermined critical mark was accepted for training. In the 
experimental phases of the work concerned with setting up such a test, 
all the candidates may be allowed to enter training and after the pas- 
sage of time it becomes known for each tested person whether or not he 
has been graduated successfully. With these results in hand—that is, 


\ FREQUENTLY occurring type of problem with which the statistician 


* Formerly Chief, Statistics Division, Office of the Air Surgeon, Medical Corps, Army of the 
United States. 
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the composite score for each candidate and whether he has or has not 
been graduated—it is desired to measure the efficacy of the battery of 
tests, or perhaps to compare the efficacy of two different batteries, for 
predicting the persons who will graduate or who will not. In the field of 
psychologic testing and in some other similar fields, it is customary to 
utilize for this purpose, the biserial correlation coefficient. That the 
biserial coefficient is an entirely satisfactory index in these situations 
may be questioned. It is open to objection on general conceptual 
grounds and its “meaning” in operational terms is not clear in relation 
to the observations which actually are made.’ 


1 This coefficient has its basis in the linear p:oduct-moment correlation coefficient of Karl Pearson, 
If two normally distributed quantities z and y are linearly :elated and the product-moment correlation 
between them is p, then if for each value of z we sre given, not the value of y, but only whether y is 
above or below a definite value ye, the biserial coefficient rp gives an estimate of p. Under these condi- 
tions, the biserial coefficient rp can be used just as can the product-moment coefficient r, which is also 
an estimate of p, albeit a more precise one, but which requires for its caiculation the actual values of y. 

But r, the product-moment coefficient, itself has limitations. For one thing, there is general agree- 
ment that it is not applicable unless the quantities z and y are linearly related, and careful statistical 
practice requires that a statistical test for linearity be applied before r is calculated. How is this to be 
accomplished in the absence of quantities representing the variable y? So far as I know such tests have 
not been applied or formulated. In the cases under discussion there is no actual continuous quantitative 
variable y, the linear relation of which to x can be questioned. A candidate is graduated or not gradu- 
ated in these situations, not on the basis of a recorded quantity but largely on the basis of qualitative 
judgments of the instructors; if there were actual grades, it would be these for which the product- 
moment r could be calculated. To “imagine” an unrecorded quantitative grading, on the basis of which 
graduation was determined, as is done in didactic expositions of the biserial coefficient rp, and further to 
*imagine” that this grade is distributed normally and is linearly correlated with the test score, when 
in fact there was no such grade observed, seems gratuitous. If the selective tests in question are not for 
the purpose of predicting whether a candidate will be graduated from training, but they are, say, tests 
for pregnancy, or for survival following operation, we shall have to “imagine” that the dichotomies 
“pregnant, not pregnant” and “living, dead” are measurable in degrees on a continuous scale. 

There are other considerations that trouble one in connection with the use of the biserial ry. Con- 
sider the formula 

rh = =e. = x a 
ot 3 

Where rp is the biserial correlation coefficient, Mg is the mean score for graduates; My is the mean score 
for those failing to graduate; o is the standard deviation of the scores for both groups taken together; 
pis the fraction of graduates; g = 1 —p; 2 is the ordinate of the normal curve corresponding to p. To be 
noted is the following: pg/z depends on the fraction of the community tested that graduates, and is 
larger the nearer the fraction is to 0.5. If therefore, two tests, 7; and 71, were being compared and rps 
was found to be larger than rp:, it could be because the second community had a more nearly equal 
rativ of graduates to nongraduates than the first community and not because 7's was better than 71. 
Only if the tests were tried in both instances on identical or equivalent populations could one be sure 
that the difference in rp was attributable to the tests themselves. But if the tests compared are utilized 
with identical populations, pg/z is equal for all of them. This factor in the formula therefore would seem 
either to be misleading or to be superfluous for measuring the efficacy of different tests. The value of p 
on which the factor exclusively depends, relates to the population for which the test is used, and since 
this is a variable quantity which is independent of the test itself, it would seem better to state it sepa- 
rately for each case and when appropriate to use it for the calculation of different required quantities 
as exemplified in some formulas given in the text. In many instances moreover, as when a test is applied 
to an experimental group and control, there is no population discoverable to which the test can be con- 
sidered uniquely attached, and hence no value of p is assignable for the calculation of rp. It is significant 
that when a test is being evaluated by comparison of test results in two groups, sampled to represent 
the “success” and “failure” categories of the criterion, discerning workers avoid the calculation of r.[1} 
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A procedure more directly related to such a set of observations is 
suggested by what is done in a similar situation in the field of medical 
serologic tests [2]. Given a number of serologic tests for a specific dis- 
ease, how does the serologist evaluate their relative efficacy? A repre- 
sentative sample group of persons who are known to have the disease 
are subjected independently to each test and the fraction of positives 


COST-UTILITY EVALUATION CHART 
Pilot aptitude score — 7219 cases 
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FIGURE 1. COST-UTILITY EVALUATION CHART. 


shown by each test is determined; this is called the “sensitivity” of the 
test. At the same time a representative group of persons known to be 
free of the disease are similarly subjected to each of the tests and the 
fraction of negatives is determined; this is called the “specificity” of the 
test. These two numbers, the “sensitivity” and the “specificity,” prop- 
erly determined, are taken to describe the test so far as the serologist is 
concerned. 

Even with the sensitivity and specificity established, the question 
which of two tests is the better is not yet completely answered. Gener- 
ally, as one increases sensitivity, one decreases specificity, so that of the 
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two tests being compared, the sensitivity of the first test being higher 
than that of the second, the specificity of the second test is likely to be 
higher than that of the first. The importance of making an error among 
persons who have the disease must be weighed against the importance 
of an error among persons without the disease and together with this 


cost 
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FIGURE 2. COST FOR UTILITY AT VARIOUS LEVELS OF R. 
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the relative number of persons with and without the disease in the 
community to be tested must be considered. It is significant that the 
serologist finds no single number with which to characterize the re- 
lationship of the results of the test to its effectiveness. He uses two num- 
bers, the sensitivity and the specificity. But the serologists’ part of the 
problem is resolved when the sensitivity and specificity of the test are de- 
termined. 

What is proposed here is simply to apply the line of thought of the 
serologist measuring the effectiveness of his tests, to the general prob- 
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lera being considered. Let us continue in terms of a score made from 
psychomotor aptitude tests which is to be used for selection of appli- 
cants for pilot training. If, using a certain critical mark in the score, we 
eliminate from candidacy for training those who have obtained a score 
below the critical mark, and allow into training those who have scored 
above this mark, then in so far as we have eliminated persons who 
would have failed in training, had they been allowed to participate, 
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FIGURE 3. MEAN-COST AND MEAN-COST-RATING IN RELATION TO R. 


we have accomplished something; we have saved the labor and cost 
of training them. If in this manner we could have eliminated all 
the potential failures, we should, so far as this objective is concerned, 
have done all that it was possible to do. But even if all the potential 
failures had been eliminated by the test, we still have to consider how 
many potential pilots would have been eliminated; the fewer of these 
the better. If none of these had been eliminated, so far as this consider- 
ation is concerned, we should have done all that could be asked. If a 
test does not eliminate all potential failures, it is because some of the 
potential failures pass the critical mark; if it does eliminate potential 
pilots, it is because some of the potential pilots do not pass this mark. 

The objective, we may say, is to eliminate potential failures; let us 
\ call the fraction of potential failures who are eliminated by the test 
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(fail to achieve the passing mark) the “utility” of the test (U). This | 


corresponds to the sensitivity of the serologic test. But the effect of 
using the test is also to eliminate potential pilots; let us call the fraction 
of potential pilots who are eliminated by the test the “cost” (C). This 
is the complement of the “specificity” in the case of the serologic test. 
We have, then, two numbers to determine corresponding to the sensi- 
tivity and specificity of the serologic test, the “utility” and the “cost.” 
These are, respectively, the fraction of those who would have been 
failures in training who fail in the test, and the fraction of those who 
would have been successful trainees who fail in the test.* 

The relations among some quantities frequently wanted may be set 
down. Symbolize the utility by U, the cost by C and the fraction of 
potential pilots in the community by P. 

1. Fraction of tested group which enters training; that is, passes the 
test encod 


F,.g = P(1’—C)+ (1 — P)(l — UV). 
2. Fraction of tested group which is graduated 
F,., = P(1 — C). 

3. Fraction of those passing the test which is graduated 

P(i — C) 
P(L—C)+(1—P)(1—U) 
4. Fraction of those failing the test which is graduated 

PC 

(l1—P)U+PC 





| = 





Fs. _ 


? It is interesting that in many fields in which tests are used, as analogous reasoning and similar 
conclusions have been reached independently, with the use of different terminology. I have referred to 
“sensitivity” and “specificity” of the serologist. Another instance is the use by Shewhart [3] of “pro- 
ducer’s risk” and “consumer's risk,” in the employment of a test fur material qualities of a manu- 
factured product. In the field of clinical psychology it is noteworthy that authors find it desirable, when 
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presenting the results of tests, to supplement the tetrachoric r by statements of the fraction correctly J 


selected and the fraction of “false positives” [1]. 

The outstanding example of the use of the concept being advanced is the application and mathe- 
matical elaboration of it by Neyman and Pearson [4], in connection with statistical tests of significance. 
The employment of a statistical test of significance involves “errors of the first kind,” the probability 
of which corresponds to my “cost,” and the statistical test has a certain “power” which corresponds to 
my “utility.” In the recent development of sequential tests by Wald [5] the same idea is used, in so 
far ss decisions are made so as to balance a stated probability of errors of the first kind against a stated 
probability of errors of the second kind. It is my hope that the present paper may help to crystallize 
the use and stimulate a mathematical exploration of the same basic principle in connection with other 
tests. 


















252 AMERICAN STATISTICAL ASSOCIATION 


5. Number tested for each one who is graduated 





1 
N, = ——— 
P(1 — C) 
6. Number trained for each one who is graduated 
(1 — P)(l — UV) 
Nur = 1 * 
. P(1 — C) 


If the utility is equal to the cost, we manifestly have no effective 
test at all, for such a test could be made by a roulette wheel which de- 
termined elimination merely on chance. We could accomplish a pre- 
elimination rate of 50 per cent of potential failures (U=0.5) merely 
by eliminating anyone for whom the roulette showed an even number. 
However, the cost would also be 0.5, since half of the potential pilots 
would be eliminated in the same procedure. In general the lower the 
cost for a given utility, the more we have accomplished, but the scale 
of accomplishment is still a matter for judgment and may depend on 
other factors. According to the viewpoint being advanced here, two 
tests are to be compared in relation to their relative costs at the. same 
utility. If they are used each at a different value of utility, and for in- 
stance test 7; as used has a higher utility and also a higher cost than an- 
other test T2, the tests are incommensurable without the introduction 
of other factors of consideration. It would be necessary to apply a rela- 


' tive weighting factor to the utility and to the cost to account for the 


possibly greater importance of eliminating a potential pilot from train- 
ing than of allowing a potential failure to enter training. The difference 
of the weighted quantities would measure what might be called ‘“‘net” 
cost-utility and the twe tests might be compared on the basis of this 
measure. : 

The method to be used for the calculation of cost-utility will vary 
under different circumstances. If the scores for both the unsuccessful 
students and the graduates are normally distributed, the mean and 
standard deviation can be calculated for each, in the usual way, and 
the areas under the corresponding normal curves for the failures and 
graduates below different critical marks will give, respectively, the 
utility and cost corresponding to these marks. The cost for any value 
of utility is in this case given by 

Cag U—---—= (1) 
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where pr C and pr U are the probits* (normal deviate) of the cost and 
utility, o, and o, are the standard deviations and Z, and #. are the means 
of the scores for the nongraduates and graduates, respectively. In 
actual practice, one can disregard any assumption of form of distribu- 
tion but simply plot the accumulated percentages for each of the two 
groups, smooth graphically and determine the cost-utility by graphic 
interpolation. An example of this is shown in Figure 1. 

In many Cases, it is possible to set the utility at any value desired 
by adjusting the passing score. For such cases it is natural to think of 
measuring the over-all efficiency of the test in terms of the mean cost 
M for all utilities. If the scores are normally distributed and U and C 
are the utility and cost respectively at score x 


U =e fee intewtas 
V 2rd 
1 . = sae 2 
C -—5J gna)" Boe dz 
TY —« 


U=+a 
M -f CdU 
U=a—x 


If C is plotted against U on a probability-probability scale, the points 
will fall on a straight line. The slope of the line will be determined 
by the ratio of the standard deviations, and the position of the line by 
the ratio of the difference of means to the standard deviation of the 
graduates. The mean cost M will be the area under this curve (measured 
in terms of the utility and cost values, not their probits). For the 
particular case in which the standard deviations are equal, the slope 
will be unity and the position will depend only on R, the ratio of the 
difference of the means to the standard deviation. In Figure 2 are shown 
the family of cost-utility lines corresponding to different values of R, 
for the case in which the standard deviations are equal.‘ 





* The short term “probit” for the value zo in the equality 
Ze 

i z 

p=—— l? dz 
Vv ‘Qe 
ao 
has the same sort of usefulness that the term “logarithm” has. It was introduced by Bliss, who em- 
ployed it with a constant added, to avoid negative numbers. I hope he will not find objectionable the 
use of the term when no constant is added. 

4 These were determined by calculating the cost for values of utility in accordance with formula (1) 
for progressive values of utility in steps 0.01, and using Simpson's rule for approximate integration. 
A similar calculation can be made for the case in which the standard deviations are unequal. 
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The mean utility will be 0.5, and the mean cost will vary between 
0.5, when there is no relation between score and success, and zero, 
when the discrimination by the test score is complete. An index analo- 
gous to the biserial r, may be given as the mean-cost-rating, M.C.R. 

M.C 


M.C.R. = 1 -—— =1 -— 2M C. 
' M.U. (2) 


The value of M.C.R., like rs, varies between the extremes of zero 
and unity, but differs from 7, in being independent of p, the fraction of 


TABLE 1 
MEAN COST AND MEAN COST RATING 





























) 
Ri M.C. M.C.R. } R M.C. M.C.R 
0.1 0.472 0.056 | iJ 0.115 0.770 
0.2 0.444 0.112 | 1.8 0.102 0.796 
0.3 0.416 0.168 | 1.9 0.090 0.820 
0.4 0.389 0.223 | 2.0 0.079 0.842 
0.5 0.362 0.276 | 2.1 0.069 0.862 
0.6 0.336 0.329 ] 2.2 0.060 0.879 
0.7 0.310 0.379 !] 2.3 0.053 0.895 
0.8 0.286 0.428 t 2.4 0.046 0.909 
0.9 0.262 0.475 2.5 0.039 0.921 
1.0 0.240 0.520 !} 2.6 0.034 0.932 
1.1 0.218 0.563 | 2.7 0.029 0.942 
1.3 0.198 0.604 2.8 0.025 0.950 
1.3 0.179 0.642 | 2.9 0.021 0.957 
1.4 0.161 0.678 | 3.0 0.018 0.963 
1.5 0.145 0.711 ! 3.5 0.009 0.982 
1.6 0.129 0.742 i 4.0 0.005 0.990 














1 R =difference of means in ratio to standard deviation. 


U=1 
2? Mean cost = fe dU 
U=0 


3 Mean cost rating = 1 —2M.C. 


graduates in the population. Thus while the M.C.R. is determinable for 
all cases in which 7, is determinable, the reverse is not true. The mean- 
cost and mean-cost-rating are shown in relation to RF in Figure 3; 
numerical values are given in Table 1. 

The relations between two tests 7; and 7: for each of which the 
scores are normally distributed can be visualized in terms of their 
representation on a prob.-prob. scale (fig. 2). Each will be represented 
by a straight line. If the standard deviations of the scores for graduates 
and nongraduates are equal (even if different for the two tests), the 
line for each test will have a slope of unity and if 7’; has a higher cost 
than 7; for any specific utility it will necessarily have a higher cost at 
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all other values of utility, and a higher mean cost. If the standard de- 
viations are unequal for graduates and nongraduates but their ratio 
is the same for 7; and 7», the lines will have a slope other than unity 
but they will be parallel. Here again if the cost for 7; is higher than for 
T, at some utility it will be higher at all other utilities and have a higher 
mean-cost. If the standard deviations are not equal and for two tests 
the ratios of the standard deviations are different, the cost-utility lines 
will not be parallel on the prob.-prob. scale, and therefore the two lines 
representing the tests may cross at a value of utility within the working 
range. The cost for fixed utility of the one test will be higher than that 
of the other above this utility, and lower for values below this utility, 
though the mean-cost may be definitely higher for one of the tests. 
Similarly in such situations the M.C. and M.C.R. may be identical 
for two tests, but one test may have the lower cost over the entire work- 
ing range of utility. Whether or not such conditions will be met often 
in practice, these considerations illustrate the possible fallaciousness 
that is inherent in using any single over-all index as a measure of effec- 
tiveness of a test. 
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YEW INDEXES OF HOURLY AND WEEKLY EARNINGS 
COMPILED BY THE FEDERAL RESERVE BANK 
OF NEW YORK 


GEORGE GARVY AND Rosert E. Lewis! 
Federal Reserve Bank of New York 

This paper presents a set of index numbers of hourly and 
weekly earnings of nonagricultural workers, prepared by the 
Federal Reserve Bank of New York. These indexes combine 
wage statistics from Government and private sources in an 
attempt to furnish the measure of over-all movements in the 
general level of earnings which will be the best possible within 
the limitations of available data. These limitations are dis- 


cussed, together with other aspects of the problem, such as the 


effect of supplementary wage practices on measurement of 


earnings. 
I. HISTORY OF THE INDEX 


HE Federal Reserve Bank of New York has recently completed 
7 on a set of indexes of hourly and weekly earnings in non- 
agricultural occupations which will supplant the “Composite index of 
wages” published by the bank since February 1938. This is the latest 
of a series of efforts by the bank’s Research Department to develop an 
over-all measure of fluctuations in wages, the first of which appeared in 
this Journal in June 1924.” 

When, more than twenty years ago, the Federal Reserve Bank of 
New York began to calculate a wage index, available information on 
wage rates and earnings was scattered and extremely limited. Profes- 
sor Douglas’ book was not published until six years later, and the 
Bureau of Labor Statistics did not begin the regular monthly collection 
and release of earnings statistics until the Great Depression. The first 
composite index of wages of the Federal Reserve Bank (compiled in 
1924 under the direction of Carl Snyder) contained only four series. 
After the 1926 revision the index comprised seven series, including such 
varied components as weekly and monthly earnings, hourly wage rates, 
weekly hiring rates, and annual salaries, depending on the type of data 
available. The wage index, as first conceived, was one component of an 

1 Early work on these indexes was conducted by Eva Mueller. Valuable assistance in compilation 
of the data was rendered by Marilyn Haggerty, Genevieve Matteson, Virginia Ormond, Mary O’Shaugh- 
nessy, and other members of the staff of the Research Department. 

2 Carl Snyder, “A New Index of the General Price Level from 1875,” Journal of the American Sta- 


tistical Association, Volume 19, pp. 189-95. See also Carl Snyder, “A New Composite Index of Wages 
in the United States,” Journal of the American Statistical Association, Volume 21 (1926), pp. 466-70. 
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index of the general price level, compiled by the bank and used among 
other purposes to deflate clearings and other economic series extend- 
ing over a long time period. As Snyder’s attention was centered on 
broad price movements and long term trends, he aimed primarily at a 
wage index reflecting general changes in the wage level. Heterogeneity 
of the basic data, resulting partly from differences in the customary 
base of compensation in different occupations, was the price paid for 
obtaining the broadest coverage possible at the time. 

The index was revised in 1932 when five additional series covering 
weekly earnings in nonmanufacturing industries were added, while the 
series for unskilled labor—the only series compiled from data collected 
by the Reserve Bank itself*—was dropped and replaced by a series on 
hourly wage rates of common labor in roadbuilding. Most of the com- 
ponent series were adjusted for seasonal variation. The basic character 
of the index was not affected by the 1932 revision, however, as it con- 
tinued to amalgamate wage rate and earnings series of various types, 
with average weekly earnings predominating. 

As additional data became available in the course of time and de- 
velopments during the depression years tended to accentuate the gap 
between wage rates and actual weekly earnings, the need for a more 
rigid definition of the scope of the index appeared necessary. The last 
previous revision, made in 1938, aimed at an index which would isolate 
changes in the rate of pay from fluctuations in the number of hours 
worked. The index resulting from this revision and published there- 
after was essentially an index of average hourly earnings (since adequate 
data for a comprehensive index of hourly wage rates were not available), 
although in some instances weekly earnings series had to be retained. 
The base year was shifted from 1913 to 1926 and two different sets of 
weights were applied, one for 1919 to 1932 and another beginning with 
1933. Between the two major revisions and thereafter, the index was 
improved by several minor substitutions in component series. 


II. SCOPE OF THE INDEX 


The present revision was undertaken in order to extend the coverage, 
particularly with respect to clerical and government workers, by in- 
corporating new data on earnings which have recently become avail- 
able, and to broaden the scope of the indexes by affording a comparison, 
where possible, between weekly and hourly earnings. In the process of 


* See W. Randolph Burgess, “Index Numbers for the Wages of Common Labor,” Journal of the 
American Statistical Association, Volume 18 (March 1922), pp. 101-03. 
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revision it was decided to present a set of indexes rather than one single 
index. 

In general, four types of indexes derived from wage statistics may be 
of use in over-all economic analysis: 

1. An index of wage rates, measuring the unit price of labor during the 
normal (or legal) number of weekly hours. Wage rate data are usually 
available for specific skills and job classifications only. Unfortunately, 
lack of adequate data precludes construction of a monthly index of 
wage rates with a wide scope which would cover a sufficiently long span 
of time. 

2. An index of average hourly earnings, reflecting the average return 
to labor per unit of time worked. In addition to changes in basic wage 
rates, average hourly earnings are affected by the average amount of 
additional payments per hour worked resulting from premium rates 
for overtime, shift differentials, and other supplementary payments. 
Data on hourly earnings are usually derived by dividing total payrolls 
by total man-hours, and therefore automatically reflect the shifting 
composition of the employed labor force among the various job clas- 
sifications in each individual industry. 

3. An index of average weekly earnings, combining the effects of 
changes in the actual length of the work week with those of fluctuations 
in average hourly earnings. 

4. An index of average annual earnings, reflecting the degree of em- 
ployment during the year as well as the level of average weekly earn- 
ings. 

The usefulness of each of these four types of indexes depends on 
the objective. Wage rates are the best basis of comparison of inter- 
industry or interregional differentials in the basic rate of pay. When 
comparing changes in the compensation of labor per time unit worked, 
average hourly earnings are the most appropriate yardstick. In most 
industries average hourly earnings also closely approximate the cost to 
employers of a time unit of labor from which unit labor cost may be 
computed.‘ Average weekly earnings reflect current changes in labor 
income, while average annual earnings also take into account the in- 
fluence of layoffs or unemployment during the year. 

Historical data for a broad index of wage rates covering the entire 
nonagricultural sector of the economy cannot be secured. Compre- 
hensive data on annual earnings are available as a by-product of 


‘ The employers’ share of the cost of payroll taxes, workmen's compensstion, and insurance and 
retirement plans represents the difference between the cost of a time unit of labor and average hourly 


earnings. 
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gle national income estimates. We have attempted to combine available 
data for individual segments of the economy into an over-all index of 

‘be weekly earnings and an index of hourly earnings with the broadest pos- 
sible coverage which could be compiled currently and promptly on a 

the monthly basis. 

lly 

ly TABLE 1 

be COMPOSITION OF INDEXES OF HOURLY AND WEEKLY EARNINGS 

of IN NONAGRICULTURAL INDUSTRIES 

an == === 

Weight 
™m Manufacturing, wage earners* 27.2 
ge Mining 2.8 
Bituminous coal mining* 


of Crude petroleum t 
Metal mining 


oocococ.}r 
wor oO 


Anthracite coal mining 
8. Quarrying and nonmetallic mining 
Is Public utilities 10.7 
Railroads, wage earnerst 4.3 
1g Street railways and busses* 2.9 
s- Electric light and power 1.9 
Telephone 1.6 
Construction t 3.8 
of Trade and service 24.6 
Retail trade t 13.7 
is Wholesale trade 6.9 
Hotels 2.3 
Power laundries 1.0 
- Cleaning and dyeing* 0.7 
: Total wage earners 69.1 
Clerical and professional 30. 
n State and local government (nonschool)* 7.1 
Insurance* 6.6 
- Manufacturing, clerical workers 6.3 
1 Federal Government* 4.1 
Teachers 4.0 
’ Brokerage* 1.5 
Railroads, clerical workers* 1.4 
) Composite index of wages and salaries 100.0 





* Adjustments for seasonal variation applied to weekly earnings series only. 
t Adjustments for seasonal variation applied to both weekly and hourly earnings series. 


The new index of weekly earnings in nonagricultural occupations is 
composed of two segments: “wage earners” and “clerical and pro- 
fessional workers.” The “wage earners” segment is further subdivided 
into “manufacturing,” “mining,” “public utilities,” “construction,” and 
“trade and service,” as shown in Chart I. A complete list of component 
series and the weight assigned to each is shown in Table 1. 
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Because the underlying data in some cases include both production 
and office workers, it was not possible to limit the series for wage 
earners to employees paid solely on an hourly basis. Data for manufac- 
turing, mining, railroads, power laundries, cleaning and dyeing, and 
construction cover no office workers. In the remaining six component 
series of this index, which have a combined weight of 29.3, the number 
of office workers is less than one fourth of the total. Production workers, 
therefore, actually account for between 90 and 95 per cent of total 
employees covered by the indexes of weekly and hourly earnings of 
wage earners. Similarly, in the clerical and professional group, data for 
Government employees include a small proportion of manual workers. 
A large group of such workers (United States Navy Yard employees) 
has been excluded. Lack of appropriate data has precluded elimination 
of wage earners in Government arsenals, force account construction 
labor, and various types of municipal service employees, such as sanita- 
tion and park department laborers. 

For hourly earnings, data were available and indexes were computed 
for all subdivisions of the “wage earners” group (Chart II). In clerical 
and professional occupations, however, it was not feasible to construct 
an index of hourly earnings. Since these employees are customarily paid 
by the week or by the month rather than by the hour, current data on 
the number of hours worked are required to arrive at hourly earnings. 
Information on changes in average hours worked is so sketchy, how- 
ever, that it is not possible to convert weekly (or monthly, or annual) 
earnings into average hourly earnings. In the long run the number of 
hours worked in clerical occupations are subject to approximately the 
same downward trend as in industry as a whole; cyclical fluctuations in 
hours worked are generally smaller than for wage earners. If it were 
possible to compute an index of clerical hourly earnings, the general 
pattern in peacetime years would probably follow very closely that of 
the clerical weekly earnings index, inasmuch as such factors as length- 
ening of working hours would be important only in the war years, 
1942-45. 

The index of hourly earnings for wage earners and the index of weekly 
earnings in clerical occupations (a close approximation of an index of 
hourly earnings for this group) are combined in a composite index of 
wages and salaries (Chart III). This index was compiled mainly to permit 
splicing of the new indexes with the index of average earnings for the 
years 1913 to 1939 previously published by the Federal Reserve Bank of 
New York. 
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III. COVERAGE 


The new indexes do not cover either domestic servants or many 
professional employees,® owing to the lack of any regular or reliable 
wage information in these fields. Members of the Armed Forces, per- 
sons on work relief, and self-employed persons are also excluded, since 
they are not ordinarily counted as civilian nonagricultural employees. 
Wages of agricultural workers were not included in the index in the 
interest of homogeneity and because of weighting problems resulting 
from the extreme seasonal fluctuations in farm payrolls.* Because of 
this seasonal movement, the large numbers of temporary and migra- 
tory workers employed, and the importance of payments in kind, agri- 
cultural wages are extremely difficult to measure. The present index of 
the Bureau of Agricultural Economics is a composite of daily and 
monthly wages rates, both with and without board. Furthermore, it is 
a “judgment” series based on the crop reporter’s estimates of wage 
rates prevailing in his locality rather than a report of wages actually 
paid, and no wage information is collected from such important em- 
ployers of agricultural labor as fruit, truck, and dairy farmers. 


IV. sOURCES 


Most of the basic data for these indexes was derived from Govern- 
ment sources. The United States Bureau of Labor Statistics was the 
principal source, with the Employment Statistics Division furnishing 
data for sixteen components; two others were obtained from the Divi- 
sion of Construction and Public Employment. Railroad workers’ earn- 
ings were derived from reports of the Interstate Commerce Commis- 
sion; data on State and local government wages came from the U. S. 
Bureau of the Census; teachers’ salaries were obtained from surveys of 
the United States Office of Education and the National Education 
Association. Earnings of clerical workers in the manufacturing in- 
dustry were computed from surveys of the New York State Depart- 
ment of Labor and confidential Government reports, supplemented 
by data from State agencies in Ohio and Pennsylvania.’ 


5 The index covers teachers and professional workers employed by Federal, State, and local govern- 
ments, but employees of law offices, hospitals, engineering firms, religious organizations, etc., are not 
covered. Independent professional practitioners are excluded as self-employed persons. 

* Since the vast majority of those engaged in farm work are self-employed or members of families 
drawing no fixed wage, the weight of farm wages would have averaged less than 3 per cent in the com- 
posite index. 

7 Ohio State Department of Industrial Relations, Division of Labor Statistics, and Commonwealth 
of Pennsylvania, Department of Internal Affairs, Bureau of Statistics. 
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V. WEIGHTS 


The relative weights used in combining the indexes for the individual 
series were derived from United States Department of Commerce data 
on wage and salary income by industries for the last prewar year 
(1939). The index therefore does not actually reflect the average hourly 
and weekly earnings of the labor force as it was employed during the 
war years in those segments of the economy covered by the present 
indexes, but rather changes which would have taken place if the dis- 
tribution of the labor force among the main branches of economic ac- 
tivity (more precisely among the individual industries listed in Table 1) 
had remained the same as in 1939. Had shifting weights been used 
throughout, the composite index would undoubtedly have been higher 
during the war years, owing to the increased importance of manufac- 
turing, particularly the durable goods industries. Shifting weights were 
used experimentally to obtain an annual series, resulting in a differential 
in 1944 of about 3 per cent in the composite index and about 5 per cent 
in the total weekly index. As the economy began to revert to its peace- 
time proportions, the difference narrowed to 1 per cent in 1945 for the 
composite index and 2 per cent for the total weekly index. As the use of 
shifting weights involves continuous revisions and would delay release 
of the indexes, it was decided to use constant weights. As soon as com- 
prehensive data on the postwar industry distribution of wages and 
salaries becomes available, the weights will be reappraised and revised 
where necessary. 

In cases where the Bureau of Labor Statistics publishes both in- 
dividual industry data and a combined figure for the entire industry 
group, the more inclusive figures are used.* This procedure reduces 
the number of series to be combined for the indexes of hourly and 
weekly earnings of wage earners from 164 series to the 16 shown in 
Table 1. The effects of shifts in the relative importance of individual 
industries are reflected in the inclusive BLS series used. Each com- 
ponent index in Table 1 thus reflects intraindusiry shifts and only the 
effects of interindustry shifts are excluded by the use of constant 
weights. 

As earnings data are not currently available for some industries their 
weight has been assigned to related activities; e.g., the weight for bank- 
ing has been assigned to insurance. Imputed weights in the indexes of 
both hourly and weekly earnings account for 8.4 per cent of the total. 


8 Industry group series used included manufacturing (combination of 134 individual industries), 
retail trade (6 series), private building construction (9 series), and metal miring (3 series). 
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VI. SEASONAL ADJUSTMENTS 


Each of the component series was carefully examined to determine 
whether adjustment for seasonal variation was called for and seasonal 
adjustment factors were worked out wherever necessary. Several series 
which are available on a monthly basis only, namely railroad wage 
earners and clerical workers, State and local government workers, and 
Federal Government employees,* were included in the weekly earnings 
index because it was felt that the effect of the variation in the number of 
weeks in the month was adequately corrected by the seasonal adjust- 
ment factors. In the case of manufacturing clerical workers, weekly 
earnings data prior to October 1945 were available only for the month 
of October each year, and the other monthly figures had to be estimated 
by interpolation. Teachers’ annual salaries were put on a daily basis by 
dividing by the average number of days in the school year to adjust for 
changes in the length of the school year. Indexes derived from school- 
year averages were placed at January and data for intervening months 
estimated by interpolation. It was judged that the advantages of the 
broadest possible coverage outweighed the disadvantage of relatively 
small errors introduced by this estimating procedure. 


VII. SUPPLEMENTARY WAGE PRACTICES AND LABOR COSTS 


It should be pointed out that an index of current money earnings is 
not a perfect measure either of the total remuneration of employees or 
of the total cost of labor to employers. Supplementary wage practices 
provide wage earners with non-money and/or deferred income. Gen- 
erally the cost of such practices is either partly or entirely met by em- 
ployers and thus has to be added to the direct cost represented by 
payrolls. Paid vacations, sick leaves, and other provisions also increase 
the remuneration per unit of time actually worked and correspondingly 
raise the cost of labor to employers. A two weeks’ paid vacation cor- 
responds roughly to a 4 per cent increase in hourly as well as weekly 
earnings. Prior to 1937 only 4 per cent of the companies covered in a 
National Industrial Conference Board survey included a vacation 
clause in their collective bargaining contracts; by 1946 the proportion 
had risen to 88 per cent. As paid vacations are granted after a certain 
length of continuous employment not all employees benefit from such 
provisions. Furthermore, as the number of paid vacation days frequently 
depends on the duration of employment, indexes of average earnings 


* Federal Government deta for employees of the Executive Branch were adjusted to exclude em- 
ployees of U. 8. Navy shipyards and personnel outside the continental United States. 
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cannot be readily adjusted for this factor. Also of increasing importance 
are provisions for continuation of income during illness and health- 
benefit plans supported by substantial employer contributions. While 
these contributions are not part of the workers’ current money income, 
they should be considered as supplements. Collective bargaining con- 
tracts including such plans covered more than 600,000 workers in 
August 1945, and recent contract negotiations, particularly in coal 
mining, have raised the coverage to well over a million. Other “fringe 
adjustments” which increase pay per actual hour worked, such as 
“nortal-to-portal pay,” also escape precise measurement. Similarly, 
the employers’ contribution to the Social Security system, to various 
types of company insurance, retirement, and similar plans might be 
regarded as supplements to money wages. The importance of such 
additional indirect or supplemental compensation has been increasing 
considerably during recent years, but again there is not sufficient in- 
formation available for even rough adjustments. Although such sup- 
plements to money wages may be large enough in some individual 
industries to lift the earnings index slightly, they are practically non- 
existent in other parts of the economy, so that their net impact is prob- 
ably too small to affect the validity of over-all comparisons based on the 
present set of indexes. 


VIII. INTERPRETATION OF INDEXES 


The indexes of hourly and weekly earnings presented in this paper 
are indicative of over-all movements in the general level of earnings. 
Their main purpose is to reveal the drift of earnings in the economy as 
a whole, not to provide a guide for policy decisions. No index of the 
nature of the two series presented in this paper can provide a guide for 
appraising price-cost relationships in individual industries, nor can it 
be readily translated into dollar-and-cents terms. More specifically, 
because of the inclusion of Government employees, the index does not 
reflect changes in earnings of workers in the private segment of the 
economy alone. Government workers have been included because the 
Government is one of the largest employers in the nation, and Federal 
and other government pay rates exert a competitive pressure on the 
wage structure. Also, in order to isolate underlying movements from 
seasonal fluctuations in the composition of the labor force in individual 
industries (such as the December influx of low-paid and part-time em- 
ployees in retail trade and post offices), those components which clearly 
exhibit seasonal patterns have been adjusted. The indexes of earnings 
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presented here, therefore, cannot be interpreted as reflecting the actual 
amount of earnings in any given month. 


IX. AVERAGE EARNINGS, 1938-46 


Monthly index numbers, using the 1939 average as a base, have been 
computed from January 1938 to date for each of the groups and sub- 
groups. Because several series are available in their present form only 

TABLE 2 


INDEXES OF HOURLY AND WEEKLY EARNINGS IN 
NONAGRICULTURAL INDUSTRIES 
1939 average = 100 per cent 








Annual averages 





1938 1939 1940 1941 1942 1943 1944 1945 1946 





Average hourly earnings indexes 


Wage earners 99 100 103 110 124 136 146 150 164 
Manufacturing 99 100 104 115 135 152 161 162 171 
Mining 98 100 101 110 119 129 135 141 158 
Public utilities 99 100 101 103 113 118 127 131 147 
Construction 97 100 103 108 124 134 141 148 158 
Trade and service 100 100 102 107 117 128 139 147 165 


Average weekly earnings indexes 


Wage earners 96 100 103 114 133 151 163 164 171 
Manufacturing 93 100 106 124 154 181 193 186 183 
Mining 93 100 102 116 135 159 184 187 198 
Public utilities 98 100 102 106 119 130 142 147 157 
Construction 96 100 104 116 139 159 172 178 184 
Trade and service 99 100 101 105 114 124 134 142 159 

Clerical and professional 99 100 101 105 112 125 133 138 148 


Average weekly earnings, all 
groups 97 100 103 1il 126 143 153 156 164 


Composite index of wages and 
salaries* gg 100 102 109 120 133 142 146 159 





* Weighted average of average hourly earnings of wage earners and average weekly earnings of 
clerica) workers. 


since January 1938, it was not feasible to carry the monthly indexes 
back beyond that date. Work is ia progress, however, on long term an- 
nual indexes of both hourly and weekly earnings which will be as closely 
comparable as possible to the new monthly series. 

In Table 2, annual averages, 1938 through 1946, are shown for each 
of the earnings indexes, with the 1939 average as the base.!° 

10 Tabulations of the monthly indexes and information on composition, sources, and weights are 
available on request from the Research Department, Federal Reserve Bank of New York. The com- 


posite index of wages and salaries will be published each month in the bank’s Monthly Review of Credit 
and Business Conditions, and the complete set of indexes appears in a special monthly release 
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The course of the “Composite index of wages and salaries” presented 
here follows a very similar course in 1938-40 to that of the “Composite 
index of wages” formerly presented by this bank (carried in the 
Monthly Review as “Wage rates”). Starting in 1941 the old index rises 
more rapidly during the war period, mainly because of three factors: 
(1) the old index assigned a heavier weight to manufacturing which ad- 
vanced faster than the average, (2) the old index included farm wages 
(not covered by the new index) which have risen sharply during the 
war, and (3) the new index includes a greater number of series on earn- 
ings of clerical workers than the old index, and these rose more slowly 
than the average. 

During the period of expanding war production, 1940 through 1943, av- 
erage weekly earnings in manufacturing increased at a rate 2} times 
as great as that of the other groups combined, and by the early part 
of 1945 earnings in this group were nearly double the 1939 level. 
After V-J Day, loss of overtime work at premium rates and shifts from 
jobs at high wages in war plants to work at more moderate rates in 
civilian industry caused the index to drop 13 per cent in three months. 
These factors have continued to offset to some extent the increase in 
wage rates, so that a year after V-J Day average weekly earnings in 
manufacturing were still 5 per cent below the wartime peak, although 
the other groups in the index had all surpassed their wartime record 
levels. Weekly earnings in mining are characterized by extreme de- 
pressions caused by coal strikes and peaks occasioned by overtime work 
in anticipation of strikes and in making up lost production after strikes. 
Weekly earnings in most industries in the public utilities group were 
already comparatively high in 1939; their increase during the war was 
50 per cent, and following a slight recession in the reconversion period, 
the index has risen to a level substantially above that attained during 
the war. In trade and service industries, wartime gains were moderate, 
but intraindustry shifts and postwar loss of overtime payments were 
not important factors after V-J Day and weekly earnings rose steadily 
to a February 1947 level 18 per cent higher than that prevailing in 
July 1945, the largest gain of any group for this period. Average weekly 
salaries of clerical and professional workers rose a little over one-third 
during the war, and in November 1946 were about 9 per cent above the 
level prevailing at the end of the war. 

Hourly earnings in manufacturing rose during the war to a point 65 
per cent above the 1939 level, but had declined 6 per cent by October 
1945 as after V-J Day they became less and less affected by premium 
rates for overtime and high wage rates in war industries. Following the 
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series of wage disputes early in 1946, hourly earnings in this group 
rose by March 1947 to a new record level, 14 per cent above the war- 
time peak. Hourly earnings of workers in the public utilities, mining, 
construction, and trade and service groups did not rise as rapidly 
during the war as did those of factory workers. However, hourly earn- 
ings of these groups were not adversely affected by the end of the war 
and by February 1947 hourly earnings had advanced above the level 
of the last wartime month by 12 per cent in construction, 22 per cent 
in trade and service, 18 per cent in public utilities, and 19 per cent in 
mining. 

During the war, average weekly earnings as a whole rose faster than 
hourly earnings, because of the increased working hours. The return 
to a normal peacetime work week after the end of the war caused 
weekly earnings to drop off sharply and it was not until June 1946 that 
the index of wage earners’ weekly earnings passed its wartime peak, 
while the index of wage earners’ hourly earnings receded only slightly 
after V-J Day and by December 1945 was at a new record level. 






















7h | 1 


—.- i fo eo oo ao te of 


tT Mm DO 





FARM EMPLOYMENT LEVELS IN RELATION TO 
SUPPLY AND DEMAND AS PER CENT 
OF NORMAL* 


Wa ter A. HENDRICKS 
Bureau of Agricultural Economics, U. S. Department of Agriculture, 
Raleigh, North Carolina 


Data on farm labor supply and demand as per cent of 
normal were collected from farmers by the Department of 
Agriculture from 1918-45. No detailed study of these data 
has been made previously. The present analysis shows the 
nature of the relationship between those data and correspond- 
ing data on numbers of workers employed in Agriculture and 
numbers of persons available for employment. The series on 
supply and demand as per cent of normal seem to have re- 
flected the farm labor situation fairly well. Supply as per cent 
of normal, as reported by farmers, seems to reflect the number 
of persons available for hire at current wage rates but not yet 
employed in agriculture. Reported data on demand as per cent 
of normal seem to reflect the number of vacant jobs that 
farmers were trying to fill by recruiting workers. 


IGURES on farm labor supply and demand as per cent of normal 

have been obtained from voluntary crop reporters by the United 
States Department of Agriculture since April 1918. This information 
was collected as of April 1 for the period 1918-23, as of July 1 in 1923, 
monthly from October 1923 to July 1932 (except for January 1 during 
the years 1927-29), and quarterly as of January 1, April 1, July 1, 
and October 1 for the years 1932 to date. The collection of these data 
by the Department of Agriculture stopped as of April 1, 1945. The 
data were obtained on a general crop inquiry by mail from a sample of 
voluntary respondents, most of whom are farmers, who reported for 
their localities on supply as per cent of normal and demand as per cent 
of normal. Instructions, which remained unchanged during the entire 
period, requested the respondents to “report per cent of present farm 
labor supply and demand at current wage rates in comparison with 
normal supply and demand at this season of the year.” Returns on 
these questions were received quarterly from about 12,000 respondents. 
The entire series of data up to April 1, 1941, is available in published 
form by geographic divisions.' The series in index number form may be 

* Joint contribution from the Department of Experimental Statistics, North Carolina Agricultural 
Experiment Station and the Bureau of Agricultural Economics, U. S. Department of Agriculture 
Published with the approval of the Director as Paper No. 216 of the Journal Series. 


1U. S. Department of Agriculture, Farm Labor Supply and Demand Statistics by Geographic 
Divisions, Crops and Markets 18(5): 102-3, May 1941. 


271 








272 AMERICAN STATISTICAL ASSOCIATION 


found in the February 1943 Farm Labor Report, supplemented by the 
October 1944 Farm Labor Report, issued by the Bureau of Agricultural 
Economics of the Department of Agriculture. The data were con- 
verted into index numbers in order to avoid misinterpretation of “per 
cent of normal.” As a further safeguard against unwarranted use of the 
data, the Department of Agriculture discouraged all attempts at 
translating the information into quantitative estimates of persons 
available for farm work or persons hired for farm work. 

It should be noted that farmers were reporting percentages of a 
normal that may vary from place to place and that may also be subject 
to seasonal fluctuations. A normal supply at each of two different loca- 
tions may mean two entirely different levels of persons available for 
hire and a normal demand at each of two different locations may mean 
two distinct levels of need for farm workers. A normal supply or de- 
mand may also mean two different levels of persons available for work 
or persons needed at the same place at two different seasons of the 
year. A question may well be raised, however, as to whether the data 
have quantitative value when they are properly interpreted. 

Data for the Pacific Coast provide an example of the way in which 
variations in the supply index correspond to changes in the supply of 
available workers. In October 1941, just before Pearl Harbor, the index 
was 69. One year later the index was down to 47. It is well known that 
during this period the Japanese, many of whom were farmers, had been 
evacuated and that many thousands of persons had gone into the 
armed forces or into nonagricultural employment. Hence, a sharp drop 
in the supply of farm workers was to be expected. The following October 
the supply index rose to 62. Again, this was to be expected in view of 
the strenuous efforts made by various agencies to recruit farm workers 
from other than usual sources, such as city dwellers, farm women and 
children, foreign workers, and the aged. Furthermore, the replacement 
of the evacuated Japanese by other workers had largely been ac- 
complished by the fall of 1943. Thus, it appears that the supply index 
correctly reflected the changes that were actually taking place. 

Relationships of this kind suggest that the data on supply and 
demand as per cent of normal can be interpreted in quantitative terms. 
This does not imply that the data reported in the two series were satis- 
factory in all respects or that there is no need for better measures of 
supply and demand. The viewpoint adopted here is that the two series, 
despite the subjective nature of the data, reflected conditions in the 
farm labor situation, and that it may be possible to determine reason- 
ably well how those conditions were reflected so that some indication 
may be obtained of the utility of such series. So far as information is 
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available, this possibility has never been adequately investigated. It 
seems reasonable to suppose that the reported “supply as per cent of 
normal” would bear a direct relation to the number of workers avail- 
able for hire in any one locality, but there may be some question as to 
what the respondents interpreted as “available for hire.” It is likely 
that some respondents might include persons already employed on 
farms whereas other respondents might not. The reported “demand as 
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CHART 1 
FARM LABOR SUPPLY AND DEMAND AS PER CENT OF NORMAL 
(U. 8. Averages as of April 1, 1918-44) 


per cent of normal” probably reflects the number of vacant jobs that 
farmers were trying to fill by recruiting workers. The mere existence of 
vacant jobs would not necessarily result in a demand for workers; 
economic conditions would have to be such that farmers would want to 
hire workers to fill those vacancies. 

It should be possible to ascertain whether or not the reported data on 
supply and demand as per cent of normal actually measure the phe- 
nomena they are intended to measure by investigating the behavior of 
the two series over a long period of time and by studying the relation- 
ship of the data to corresponding data on levels of farm employment 
and numbers of persons available for employment in agriculture. If the 
data on supply and demand have some validity as quantitative meas- 
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ures of the pressures affecting farm employment, they should exhibit a 
fairly high correlation with such data, and the nature of the relation- 
ship should furnish a basis for interpreting the series on supply and 
demand. 

First of all, consider the fluctuations in the series over long periods of 
time. Supply and demand as of April 1 reflect conditions in any one 
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HJRED FARM WORKERS IN THE UNITED STATES AS OF APRIL 1°AND 
MAY 1, COMPARED WITH ESTIMATES DERIVED FROM APRIL_1 
SUPPLY AND DEMAND AS PER CENT OF NORMAL 


year. Chart 1 shows the variations in supply and demand (per cent of 
normal) as of April 1 for the United States from 1918 to date. The in- 
verse relationship between the two series, which is at once apparent, 
has been responsible for the viewpoint expressed by Black? that either 
of the two contains all of the pertinent information. Black concluded 
that the series “reflect changes in farm labor supply very largely and 
changes in farm labor demand very little.” This concept has received 


? John D. Black, “Agricultural wage relationships.”’ Review of Economic Statistics 18(1): 8-15, Feb- 
ruary 1936. 
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considerable publicity and represents a viewpoint that has been rather 
widely accepted. However, the question does not seem to have been 
subjected to critical investigation at any time. Although there is a high 
degree of negative correlation between the two series, there is good 
reason to expect such a relationship even though the two percentages 
were independently estimated as measures of two distinct situations. 
An examination of the data in Table 1 sheds considerable light on the 
problem. 
TABLE 1 


Hrrep WorKErs ApRit 1 AND May 1, ComparepD wits Apri 1 
Suppiy AND DEMAND As Per Cent or NORMAL 

















— Supply | Demand | Hired workers | Hired workers 
April 1 April 1 April 1 May 1 
Per cent | Per cent Thousands Thousands 
1925 91 } 90 2744 3083 
1926 90 91 2830 | 3161 
1927 91 89 2786 3191 
1928 96 89 2768 3257 
1929 94 90 2792 3279 
1930 100 85 2665 3175 
1931 114 72 2655 | 2935 
1932 122 64 2365 2740 
1933 126 60 2240 2594 
1934 107 70 2312 2566 
1935 102 74 2077 2389 
1936 94 82 2508 2648 
1937 87 88 2269 2652 
1938 94 2 2250 2643 
1939 93 83 | 2109 2645 
1940 92 84 2010 | 2483 
1941 76 92 1991 | 2423 
1942 61 98 2010 | 2397 
1943 52 104 1875 2244 
1944 50 104 1679 1989 











For a period of about 10 years, starting with 1925, there is a distinct 
positive correlation between demand and the number of hired farm 
workers. This correlation later disappears and finally becomes negative 
near the end of the series. The data on supply account for this behavior. 
The positive correlation between demand and number of hired workers 
during the first period is present because the supply was apparently 
sufficient to satisfy the demand at current wage rates. In more recent 
years the supply was too low to satisfy the demand at current wage 
rates and an increase in demand was not accompanied by a higher level 
of employment. The employment level at any time thus appears to be a 
joint function of supply and demand, and such a situation is logical. 
The simplest joint regression function that might be used to describe 
the relationship between numbers of hired workers and supply and 
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demand percentages is of the form, y = ao+4a141+2%2+-43%122, in which 
y represents the number of hired workers (thousand persons), 2; repre- 
sents supply as per cent of normal and zz represents demand as per cent 
of normal. This equation was fitted to the data on hired workers as of 
April 1 and the hired workers as of May 1, using the April 1 supply and 
demand percentages as the independent variables in both cases. Chart 2 
shows how well the equation fits the data. The numerical values of the 
4 parameters in the regression equation are as follows: 


April May 
Qo = — 7642.21 — 8750.94 
aq =+ 48.2512 + 54.7639 
a= + 68.7837 + 80.2790 
a = — 0.0327150 — 0.0435730 


These results indicate that supply and demand as per cent of normal 
can be translated into numbers of workers. In connection with the 
present study, that fact is important only to the extent that it helps to 
interpret the data on supply and demand as percent of normal. The 
supply and demand series apparently reflect the economic conditions 
that affected employment levels and the relationship between those 
data and the farm-employment data furnishes some evidence that such 
is the case. There has been a general downward trend in farm employ- 
ment during the past 20 years; during the first half of the period this 
trend was caused largely by a decreasing demand, whereas, during the 
latter half of the period, it was caused by a decreasing supply. The basic 
economic conditions inducing a decreasing demand during the first half 
of the period and a decreasing supply during the latter half are gen- 
erally understood at the present time and need no elaboration here. All 
that is necessary is to point out that the supply and demand series 
seem to reflect the pressures affecting farm employment that they were 
designed to measure. 

The above discussion does not preclude the possibility that the in- 
verse correlation between the supply and demand series is so high that 
either contains all of the pertinent information. That possibility can be 
ruled out, however, by a simple test of the data. It has been shown that 
the number of hired workers at any time is related to supply and de- 
mand by a simple joint regression function. If supply and demand were 
so closely correlated with each other that either could be expressed as a 
function of the other, a smooth curve, presumably a second degree 
parabola, would be obtained when the number of hired workers is 
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plotted against supply or against demand. The scatter diagrams actu- 
ally obtained when hired workers as of April 1 are plotted against 
April 1 supply and demand as per cent of normal are shown im parts A 
and B of chart 3. Obviously, neither supply nor demand alone can ac- 
count for changes in levels of farm employment. These charts effectively 
refute the argument that the series on supply and demand merely re- 
flect changes in supply as suggested some time ago by Black. 

The statistical significance of the various quantities in the two regres- 
sion equations previously given may be tested by analysis of variance. 
Table 2 gives the mean square assignable to the gross regression of 
number of workers on supply, the mean square attributable to the in- 
clusion of demand as a second variable, and the mean square attribu- 
table to including the product of supply and demand as a third variable. 

TABLE 2 


ANALYSIS OF VARIANCE OF HIRED WoRKERS, SHOWING EFFECTS OF 
VARIABLES IN JOINT REGRESSION EQUATIONS 























Degrees Mean Square 
Source of variability of 
freedom April 1 May 1 
Supply 1 F75,483 560 ,555 
Demand 1 1,. 79,168 1,841,552 
Product of supply and demand 1 1,047 1,857 
Error 16 24,126 11,146 





This analysis shows that the linear regressions on supply and demand 
are statistically significant but that nothing was gained by including 
the product term. A simple multiple regression of number of workers on 
supply and demand would have served just as well. The corresponding 
correlation coefficients are as follows: 











April 1 | May 1 














| | 
Correlation of workers with supply | 0.5021 0.4659 
Correlation of workers with supply and demand | .9144 .9645 
Correlation of workers with supply, demand, and product term | .9146 | . 9649 





The fact that supply and demand can be translated into number of 
hired workers suggests that they may also be translated into number of 
persons available for hire. No satisfactory series of persons available for 
agricultural employment is available to test such a relationship. The 
April 1942 Farm Labor Report issued by the Bureau of Agricultural 
Economics contained a series on “persons comprising the primary 
sources of the farm labor supply” for the years 1920-42. In brief, this 
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series included all persons of working age except those in military 
service and in nonagricultural employment. The series has been cri- 
ticized on the ground that it includes all urban unemployed, many of 
whom would not actually be available for farm employment except 
possibly under a drastic system of regimentation that is never likely to 
be invoked. It is certain that the series includes many persons that the 


Number potentially available 





=maeeewns Estimated number 
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“PERSONS COMPRISING THE PRIMARY SOURCES OF THE FARM LABOR 
SUPPLY,” COMPARED WITH ESTIMATES DERIVED FROM APRIL 1 
SUPPLY AND DEMAND AS PER CENT OF NORMAL 


farmer does not consider a part of the available supply. On the other 
hand, it seems reasonable to expect a fairly close relationship between 
urban unemployment and persons available for farm work because both 
are affected by changes in national economic conditions. Consequently, 
the above-mentioned series should be highly correlated with supply 
and demand as reported by farmers. Demand enters into the relation- 
ship because the series on “persons comprising the supply” contains 
those already employed in agriculture, whereas the reported supply 
as per cent of normal does not seem to include them. Chart 1 shows that 
supply as per cent of normal varies between a peak of 126 per cent 
reached in 1933 and a low of 50 per cent reached in 1944. The data on 
“persons comprising the supply” given in Chart 4 show a much 
smaller spread on a percentage basis. This would indicate that farmers 
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tend to regard persons already employed in agriculture as persons not 
available for hire when submitting their estimates. Using the same form 
of joint regression function previously applied to the hired-worker series 
to the series on “persons comprising the supply,” gives the results 
shown in Chart 4. The numerical values of the parameters in the re- 
gression equation are: 


Qo = + 43449.25 
a4 =-+ #£119.377 
a= — 160.017 
ag = — 1.79971 


Numbers of persons were rounded to the nearest thousand in the com- 
putations, and supply and demand were expressed in terms of per cent 
of normal as before. 

Chart 4 shows a fairly close agreement between “persons comprising 
the supply” and corresponding values computed from supply and 
demand as per cent of normal. The computed values are consistently 
too high for the years 1924-29 and consistently too low for the years 
1937-40. It is well to remember, however, that the data on supply 
reported by farmers are not strictly comparable with “persons com- 
prising the supply” for reasons already given. The data on supply and 
demand as per cent of normal may reflect the farm-labor situation more 
accurately than the series with which they are being compared. 

The statistical significance of the various quantities in the regression 
equation may be tested by analysis of variance as before. The results 
are given in Table 3. 


TABLE 3 


ANALYSIS OF VARIANCE OF Persons CoMPRISING THE SuppLy, SHOWING 
Errects oF VARIABLES IN JOINT REGRESSION EQUATION 














Source of variability Degrees of Mean square 
freedom 

Supply 1 175,345 ,400 

Demand 1 58,851,584 

Product of supply and demand 1 2,402 ,344 

Error 19 1,966 ,483 








Here again the linear effects of supply and demand are both significant, 
with no significant contribution from the product term. The correlation 
coefficients are: 

Correlation with supply 0.8000 


Correlation with supply and demand -9246 
Correlation with supply, demand, and product term 
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It is interesting to note that the simple correlation with supply alone is 
fairly large but that including demand as a second variable had a 
significant additional effect. As previously stated, demand was included 
in the equation under the assumption that “supply as per cent of 
normal” by itself would not adequately reflect trends in the series of 
“persons comprising the supply” because of the farmers’ psychology 
in interpreting the term. It is also worthy of mention that “supply as 
per cent of normal” is reported according to existing wage rates. There- 
fore, it is possible that “demand as per cent of normal” enters the 
picture as an expression of the wage-rate effect since supply and de- 
mand as here defined are highly correlated with wage rates. However, a 
study of those correlations is beyond the scope of this paper. 


SUMMARY AND CONCLUSIONS 


In conclusion, it appears that the data on supply and demand as 
per cent of normal have considerable utility in describing the farm 
labor situation at any time. No comparable data are now available from 
any other source. They help to explain changes in employment levels 
and thus provide information that cannot be obtained from a series of 
employment estimates alone. 

When farmers reported supply as per cent of normal, it appears 
that they reported on the relative number of persons available for hire 
but not yet employed. If workers already employed on farms were in- 
cluded, it seems unlikely that the series on supply would show such 
wide fluctuations. The data on demand as per cent of normal appear to 
relate to the relative number of vacant jobs that farmers were trying 
to fill by recruiting workers. It should be emphasized that vacant jobs 
will result in demand only when conditions are such that farmers are 
trying to recruit workers to fill them. When an adequate supply is 
available an increased demand will result in a higher level of employ- 
ment, but when the supply is low an increased demand will not result in 
increased employment. In such cases an increasing demand may even 
be associated with decreasing employment levels. 

The series on supply and demand apparently measure the factors 
they were designed to measure. Both must be taken into account, be- 
cause neither one alone accounts for all changes in the other. This con- 
clusion is contrary to opinions previously held by some investigators, 
but the results of the present study seem to warrant it. There is a 
marked inverse correlation between the two series, but this appears to 
be a natural consequence of the fact that economic conditions inducing 
a low supply often induce an increasing demand. The correlation be- 
tween the two series, however, is not sufficiently large to permit either 
to be disregarded. 








THE USE OF THE ANGULAR TRANSFORMATION 
IN BIOLOGICAL ASSAYS* 


Liza F. KNUDSEN AND Jack M. Curtis 
Federal Security Agency, Food and Drug Administration! 


A comparison is made of the use of the angular transforma- 
tion and the use of the probit transformation in evaluating re- 
sults from biological assays having percentage responses. It is 
found that the angular transformation results in making the 
weights dependent on the number of animals used on each 
dose and if these are equal the weights are eliminated. The 
logit transformation is also cited. 

A two-dose assay design used in conjunction with the angu- 
lar transformation results in a simplified calculation which can 
be put in the form of a graph and nomograph for use in the 
laboratory to estimate both potency and error of the assay as 
a per cent of the standard. 

A comparison of the calculation time involved shows that 
the probit method requires about twelve times as long as the 
angular transformation method involving graph and nomo- 
graph. A comparison of results obtained shows very little dif- 
ference. 


INTRODUCTION 


ucH commendable work has been done in the application of sta- 
M tistical techniques to the design and evaluation of biological as- 
says involving quantal or percentage responses. The earliest approaches 
to this problem were made by Trevan? and Gaddum.? Bliss** has re- 
cently developed the application of these techniques and has greatly 
contributed to the dissemination of knowledge concerning their use. In 
a few words the approach of these authors consisted of transforming 
the percentage response into equivalent normal deviates by use of a 
normal probability table, and using the logarithm of the dose instead 


* Presented at the joint meeting of the Biometrics Section and the A.A.A.S. in Boston, December 
29, 1946. 

1 The authors gratefully acknowledge the kind cooperation and advice of Dr. Arnold J. Lehman, 
Chief, Division of Pharmacology, in the preparation of this manuscript, and further acknowledge the 
exce!lent drafting of the charts and nomograph by Stanley E. Srensek of the Division of Pharmacology, 
and Reuben Soler of Food Division, Food and Drug Administration. 

2 Trevan, J. V., “The Error of Determination of Toxicity,” Proceedings of the Royal Statistical 
Society (London) B, Vol. 101, pp. 483-514, 1927. 

? Gaddum, J. H., “Methods of Biological Assay Depending on a Quantal Response,” Medical Re- 
search Council (Brit.), Special Rept. Series, No. 183, 1933. 

# Bliss, C. I., “Calculation of the Dosage-Mortality Curve,” Annals of Applied Biology, Vol. 22, 
pp. 134-167, 1935. 

5 Bliss, C. I., “The Determination of the Dosage-Mortality Curve from Smal] Numbers,” Quarterly 
Journal of Pharmacy and Pharmacology, Vol. 11, pp. 192-216, 1938. 
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of the dose itself. It was shown that by use of these transformations the 
dosage response curve became approximately linear over most of its 
range. Bliss added 5 to the equivalent normal deviate to avoid nega- 
tive numbers and labeled the resulting figure a “probit” (or probability 
unit). Because of the transformation of the percentage response to 
probits, a special weighting factor was necessary, greatly increasing 
the labor of calculation. 
This weighting factor was 





nz" 
w= — 
Pq 
where p=proportion of animals responding to a single dose 
q=1-—p 
n=number of animals used on a single dose 
S= dp _ . ev lz 
dy /2r 
and is obtained from 
1 _ 
p=-—= e~*/2dt. 
/ 22 J —« 


In order to make the weighting factor more accurate, a preliminary 
freehand straight line was fitted to the data which had been plotted 
probit vs. log dose. The values of the expected probits were read off the 
freehand line. Then by means of these expected probits, the corrected 
probits and the proper weights to be used in the calculations were read 
from a set of tables. 

Although the accuracy of the determination of potency, error of the 
assay, slope of the regression line, etc., by the probit method is satis- 
factory, the method requires laborious, time-consuming calculations. 
While it is true that such calculations can be performed routinely by 
clerks after much practice, it is very difficult to explain the object of 
the procedures to the non-statistician. 

Of the short-cut methods developed to alleviate the tedious aspects 
of the probit method, the two most applicable to the type of data and 
type of problem with which this paper deals are those of de Beer* and 
Miller.? Miller has modified the usual probit method by eliminating 

6 de Beer, E. J., “The Calculation of Biological Assay Results by Graphic Methods. The All-or-none 
Type of Response,” Journal of Pharmacology and Experimental Therapeutics, Vol. 85, pp. 1-13 (1945). 


7 Miller, L. C., “The U.S.P. Collaborative Digitalis Study Using Frogs 1939-1941,” Journal of the 
American Pharmaceutical Association, Vol. 33, p. 258, 1944. 
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the refinements of probit measurement, limiting the doses to two and 
standardizing the number of animals so that the formulas used in the 
calculations for potency and standard error of assay could be simplified. 
The objections cited to the probit method of calculation apply in a 
limited way to Miller’s procedure. De Beer’s approach is a graphic 
presentation of the probit method of interpretation of biological assays. 
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PERCENTAGE 
FIGURE 1. COMPARISON OF ANGULAR TRANSFORMATION (¢) WITH 
LOGIT AND PROBIT TRANSFORMATIONS 

It involves the use of two nomographs and three separate scales for use 
on specially designed paper and entails several calculations. The suc- 
cess of the use of this method depends on the ability of the assayist 
(1) to design the assay so that the average response on both standard 
and unknown will be 50 per cent (which allows simplification of the 
formula for the standard error) and (2) to fit freehand parallel lines to 
the data as plotted on the special log-probit paper. 

Mention should also be made of the logit method suggested by Berk- 
son.® This method uses weights of the form 


w = npg. 


* Berkson, J., “Application of the Logistic Function to Bio-Assay,” Journal of the American 
Statistical Association, Vol. 39, pp. 357-365, 1944. 
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Berkson gives only the LD,o (lethal dose causing 50 per cent death) 
and does not give the formulas for the combination of two such parallel 
dosage response curves in order to calculate the potency and standard 
error of the assay, though presumably these could be developed mathe- 
matically. Several examples of data are given to show that the logit 
method gives a better fit to actual data than the probit method. 
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LOG DOSE 
FIGURE 2. GRAPH OF TYPICAL TWO DOSE ASSAY 


Therefore, for the purpose of general use, a method has been worked 
out with the object of overcoming the above-cited objections and 
based on the following considerations. 

(1) Independence of any weighting factor other than the number of 
animals used. 

(2) The use of transformation of the percentage response to units 
other than equivalent normal deviates (probits) such that the dosage- 
response curve becomes approximately linear over most of its range. 

(3) The limitation of certain variables in the assay (i.e., the number 
of animals, the number of doses, and the dose interval used) so that a 
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graph may be constructed for the potency as per cent of the standard 
and a nomograph for the standard error of the assay also as per cent 
of standard. 


WEIGHTING FACTOR 
There is at least one transformation which accomplishes the first aim 
and that is the angular transformation 
p = sin? 0’, 
Here p is the proportion responding, and @’ is in radians. The weight to 
be used with this transformation is 


1 
— = An. 


o5’" 


weight = 


This transformation was originally proposed in another form by 
R. A. Fisher.’ Bartlett,'® in his paper on the square root transformation, 
discusses the analogous transformation which in the terminology used 
here is 


9’ = sin-! 4/p 
Bliss" gives a table of the angular transformation. It is also discussed at 
some length in a book edited by Eisenhart, Hastay and Wallis." Most 
of the tables give 6 in degrees instead of radians; therefore a constant 
is involved in the equation for the weight. 

1 4xn 4n 


weight = —- = == 
a,” (180)? (57.2956)? 


6 





Thus the weight to be used is entirely dependent on the number of ani- 
mals used on each dose. The weights become equal, when the experi- 
ment is so designed as to use the same number of animals on each dose. 


LINEARITY OF DOSAGE-——RESPONSE CURVE 


It remains to be shown that the angular transformation is equally 
as good as the probit or the logit transformation in linearizing the dos- 


* Fisher, R. A., “On the Dominance Ratio,” Proceedings of the Royal Statistical Society of Edinburgh, 
Vol. 42, pp. 221-341, 1921-22. 

10 Bartlett, M. S., “Square Root Transformation in Analysis of Variance,” Supplement to the 
Journal of the Royal Statistical Society, Vol. 3, pp. 68-78. 1936. 

11 Bliss, C. I., “The Analysis of Field Experimental Data Expressed in Percentages,” Plant Protec- 
tion No. 12, Leningrad, 1937. 

12 Eisenhart, Churchill; Hastay, Millard W.; and Wallis, W. Allen, Selected Techniques of Statistical 
Analysis for Science, Indusiry, Research and Production and Management Engineering, McGraw-Hill, 
1946, 
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age-response curve. The effects of these three transformations (the 
probit, the logit, and the angular transformation) can be compared by 
plotting all three on the same chart as shown in Figure 1. The tables 
used for the probit and angular transformations are given in Fisher and 
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FIGURE 3. ESTROGENIC ASSAY—GRAPH FOR DETERMINING POTENCY AS 
PERCENT OF STANDARD FROM TWO-DOSE METHOD 

H=HIGH DOSE 

L=LOW DOSE 


_ 1.259 
L 


Yates."* It is obvious that there is little difference between the trans- 
formations in the range 5 to 95 per cent and not too much difference 
even beyond these limits. Moreover, in actual laboratory procedures 
every effort is usually made to avoid assay responses below 5 per cent 
or above 95 per cent. Nevertheless such responses do occur occasionally 
and some means of handling them is a practical necessity. Potencies 
calculated from a two-dose assay having 0 or 100 per cent response 
require caution in interpretation and sometimes they are no more than 


% Fisher, R. A., and Yates, F., Statistical Tables for Biological, Agricultural and Medical Research, 
Oliver and Boyd, London, 1938. 
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rough indications of the actual potency no matter what type of trans- 
formation is used. The angular transformation offers a convenient 
method of handling these troublesome values even though the poten- 
cies so obtained require confirmatory evidence. 
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FIGURE 4. ESTROGENIC ASSAY—NOMOGRAPH FOR DETERMINING ERROR OF 
THE ASSAY AS PER CENT OF STANDARD. FOR i=0.1 AND N =20 
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Thus the aims, 1 and 2, as stated in the introduction are accom- 
plished in that (1) the use of the angular transformation together with 
the logarithm of the dose will cause the dosage-response curve to be- 
come approximately linear over most of its range, and (2) the use of 
equal numbers of animals on each dose equalizes the weights. 


LIMITING CERTAIN VARIABLES IN THE ASSAY 


A further simplification of the calculation involved in the interpreta- 
tion of bioassay results is accomplished by the use of a two-dose assay 
which has been used by several authors, among them Sherwood, Falco 
and de Beer on penicillin; and Knudsen, Smith, Vos and McClosky™ 
on epinephrine, for measured responses and Pugsley and Morrell'* on 
estrogens for percentage responses. The use of a two-dose assay can 
result in a simplification of the calculation for potency and error of 
the assay in terms of per cent of the standard. Though in some instances 
2 two-dose assay is not as accurate as a three-dose assay, in other in- 
stances the former is advocated. Assuredly the calculations are simpli- 
fied by its use. 


DERIVATION 


In the statistical interpretation of a two-dose assay, the following 
nomenclature and derivation can be used as in the penicillin assay.'” 


Sa =response to high dose of standard in terms of @ 
Sz. =response to low dose of standard in terms of @ 
Uy =response to high dose of unknown in terms of @ 
Uz=response to low dose of unknown in terms of @ 
t =log ratio of doses =log (high dose/low dose) for both 
standard and unknown. 


V = U, — Si + Ung — Sa 
W = Sz — Sit+ Ug — U1. 


Figure 2 gives a graphic presentation of these terms. The combined 
slope, b., from both standard and unknown is used as the slope for the 
parallel lines. 


4 Sherwood, M. B.; Falco, E. A.; and de Beer, E. J., “A Rapid Quantitative Method for the 
Determination of Penicillin,” Science, Vol. 99, pp. 247-248, 1944. 

1% Knudsen, L. F.; Smith, R. B.; Vos, B. J.; and McClosky, W. T., “The Biological Assay of 
Epinephrine,” Journal of Pharmacology and Experimental Therapeutics, Vol. 86, pp. 339-343, 1946. 

1% Pugsley, L. I., and Morrell, C. A., “Variables Affecting the Biological Assay of Estrogens,” 
Endocrinology, Vol. 33, pp. 48-61, 1943. 

17 Knudsen, L. F., and Randall, W. A., “Penicillin Assay and Its Control Chart Analysis,” Journal 
of Bacteriology, Vol. 50, pp. 187-200, 1945. 
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The vertical distance between the paraliel lines equals 


Ur—Si+Un-Sx V 


2 2 





Then since the potency of an unknown is equal to the ratio of the doses 
of the unknown and the standard that produce equal responses in the 
animals, the potency is the antilogarithm of the horizontal distance 
between the parallel lines. This is the antilogarithm of M, where 


vertical distance iV 


i = = 


slope Ww 





i 
Therefore, potency as % of standard = antilog (2 + —). 


From the equation for M the standard error of the assay can be derived 
in several ways. The simplest is to state that since the squares of the 
percentage errors are additive in a ratio!® and V and W are independent 
of each other 


om’ oy? ow" 





mw 


4(57.2958)? 
but ow? = oy? = — for 6 in degrees 
n 


where n=number of animals on each dose. Therefore 
,  2°(57.2958)? 


sal Wn 


(W? + V2). 


This is on a logarithmic basis and can be stated in terms of percent- 
ages as follows: 
Standard error of the assay =log, (10)-(oa)- (potency) or 
Standard error of the assay = 2.3026-(o¢) - (potency). 
An example of the use of the above described method in the interpre- 
tation of the results of a biological assay is presented below. The data 
used in this illustration are from a mouse assay of insulin performed by 


18 Deming, W. Edwards, Statistical Adjustment of Data, John Wiley and Sons, p. 43, 1943. 
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Marks and are published in Burn’s Biological Standardization,’® page 


96. 

Preparation Dose No. of Mice 
Former standard 1/88 unit 49 
Former standard 1/44 unit 40 
Crystalline insulin 0.4737 40 
Crystalline insulin 0.9467 40 


Symptoms 
6 


The assay was designed to check the assumption that crystalline in- 
sulin contained 24 International Units of insulin per mg. It was known 
from previous experience that 1/88 and 1/44 of a unit of the standard 
gave responses in the critical range. Therefore, if the assumption is 
correct, doses of crystalline insulin of 1/88 and 1/44 of 1/24 of a milli- 
gram (.473y and .946y) of crystalline insulin should give the same 
response as 1/88 and 1/44 of a unit of the standard. In this insulin 
assay the high dose of the standard is just twice the low dose. This is 
also true for the unknown. Thus 7=log 2=0.301. The values of @ are 
determined from Table XIII of Fisher and Yates." A convenient form 
for the computation is given in Table 1. The resulting calculated value 
of 23.7+2.0 units per mg, is to be compared with the potency as re- 


ported by Marks, 23.5 units per mg. 


TABLE 1 
Sz Sz Ux UL 
Response 24/40 6/40 22/40 7/40 
6 50.8 22.8 47.9 24.8 


V =Ug+U,z —Sx —Sz = —0.9 
W =Un +Sq —Uz, —Sz =51.1 


iV 
Potency as per cent of standard —antilog (2 + *) 





-301(.9 
=antilog (2- £9) =98.8% 


51.1 
42(57. 2958)? (.301)? (57.2958)? 
2 = —______——— (W?2+- V2) =——_—_—_—_——- | (51. 1) +-(. 9) 
ou — (W?+V2) 20081.1)° { (51. 1)?+(.9)*} 
ou =0.0534 


Standard error of assay =2.3026 (Potency) oy =2.3026 (98.8) (.0534) =12.15% 


Since the crystalline insulin was estimated as 24 units per mg. 
Potency =23.7+2.9 


THE USE OF THE GRAPH AND NOMOGRAPH 


If it is further specifie’ that in addition to having the same number 
of animals on each dose ui standard and unknown, a constant ratio of 


19 Burn, J. H., Biological Standardization, Oxford University Press, London, p. 96, 1937. 
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high to low dose be used (i.e., a constant value of “i”) then the equa- 
tions reduce to: 


P t —_ til ( . -) 
otenc antho ee 


Standard error of the assay oe 
= — /W? + V?. 
Potency Ww? 





an 


where c; and cz are constants. When the equations are in this form a 
graph and nomograph™ can be devised.”° Figures 3 and 4 are the 
graph and nomograph which were constructed for an estrogenic assay 
with N =20 and i=0.1. The use of the graph and nomograph can be 
illustrated by using data from a typical estrogenic assay. 


Response 0 
Sa 18/20 71.6 
Si 14/20 56.8 
Uz 17/20 67.2 
UL 10/20 45.0 


Calculate the values 
V =Ug+U1—Sa—S1i= —16.2 
W=Ugz+Sea—Uzi—-—Szt= 37.0 
and on the chart for potencies (figure 3) find the point V = —16.2 and 
W =37.0. This point falls between the radial lines labeled 90 and 92. 
The potency can be estimated as 90.5 per cent. Now calculate the value 
of Z= W?+ V?=1631, and on figure 4 connect the point corresponding 
to this value on the Z scale with the point on the W scale for 37.0. 
Find the ratio of standard error of assay to potency on the third scale 
as .086. Multiply this value by the potency (90.5) to obtain the stand- 
ard error of the assay as 7.7 per cent. 

The data from this assay was also used to calculate the potency and 
standard error of the assay by the probit method. The result obtained 
was 91.1% +7.9%. The calculations using the probit method required 
about twelve times as long as those for the angular transformation 
method using graph and nomograph. 


CALCULATIONS FOR ASSAYS USING MORE THAN TWO DOSES 


The method described above applies the graph and nomograph tech- 
nique only to the analysis of data of a quantal response assay which 


2° Lipka, Joseph, Graphical and Mechanical Computation, John Wiley and Sons, 1918. 
2 Enlarged copies of the graph and nomograph for estrogenic assay may be obtained from the 
Division of Pharmacology, Food and Drug Administration, Washington 25, D. C. 






























.TION 


>qua- 


rm a 
. the 
ssay 
n be 


and 


ilue 
= 


ling 


sale 
nd- 


und 
ned 
red 
ion 


ich 


the 





ANGULAR TRANSFORMATION IN ASSAYS 293 


has two doses each of unknown and standard and which uses the same 
ratio of these two doses on both standard and unknown. The trans- 


formation 
p = sin? 6 


can also be applied to quantal response assays which have more than 
two doses on each of unknown and standard. The equations for potency 
and standard error of the assay are practically the same as those given 
by Gaddum,’ Bliss,5 and Irwin. 

; ‘ ‘ a a 
potency = antilog (2 +2,—- 2+ =>) 


c 


error of the assay = (2.3026) - (potency) - (cas) 


where 





3 = 
Cu" = 


al 1 1 1 
b.? >> Wiu Zz. Wie 





+ (Fu = 5s)? | 
b2[>> Wiu(Lin -— E.)? + > Wis(Lis = Z,)?] 
In the above equations 
#, and , are the mean log doses for unknown and standard respec- 
tively. 
4, and 7, are the mean response in terms of @ for the unknown and 
standard respectively. 
>w, and >-w, equal the sums of the weights for unknown and standard 
where the weight is four times the number of animals used on 
that dose. 


e.g. Vin = 4nin 
in which n;,=number of animals used on the ith log dose of the 


unknown (2;,). 
b. is the combined slope of standard and unknown. 


_ Li wiultin — Eu) (Yiu — Fu) + De wile — Fe) (Yin — Fe) 
7. Wis(Lin Poe 2.)* + Zz. Wis(Lis _ Z.)* 


An illustration of the calculations using this method is given in Ta- 
ble 2 using the data and general tabular procedure of Miller, Bliss and 





b 


* Irwin, J. O., “Statistical Method Applied to Biological Assays,” Supplement to the Journal of the 
Royal Statistical Society, Vol. 4, pp. 1-60, 1937. 
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Braun” in an assay which compares routes of injection of digitalis, ar- 
bitrarily labeling the lymph sac route as standard and the intramus- 
cular route as the unknown. The potency and error of the assay deter- 
minations obtained agree very well with the values they report. The 
result obtained by the probit method used by Miller et al. is 208.7 + 20.4 
as compared to the result of 208.7 + 19.8 obtained by the angular trans- 
formation method. 














TABLE 2 
z y w 
log dose Response 0 w=4n wz wy wry 
Std. 75 2/15 21.4 60 45. 1284. 963.0 
(lymph sac) -85 5/15 35.3 60 51. 2118. 1800.3 
-95 8/15 46.9 60 57. 2814. 2673.3 
153. 6216. 5436.6 
wz? =131.25 
Unknown 45 2/15 21.4 60 27. 1284. 577.8 
(intramuscular) 55 6/15 35.3 60 33. 2118. 1164.9 
-65 10/15 54.7 60 39. 3282. 2133.3 
99. 6684. 3876.0 
wa? =55.65 
Standard Unknown 
Zw= 180 180 
1/lw= - 005556 - 005556 
Zur = 153.0 99.0 
%=Zwz/lw= . 850 . 550 
Zwy = 6216.0 6684.0 
J =Zwy/Zw= 34. 533333 37. 133333 
[wa?] = Dw2?-ZLwz = 1.200000 1. 200000 
[wry] = Dwry —JDwr = 153.005100 199. 8033 
b=[wzy]/[wz?] = 127. 504250 166. 502750 
Std. and Unknown 
Combined 
Zz [wry] 352. 8084 
2 [wz?] = 2.4000 
b= 147.0035 
3e—-Ju= —2.6000 
Js —Fulbe = —.017687 
M =%, —%y—(1/be) (Je —Fu) = . 317687 
Potency =antilog (2+M) = 207.8 
(1) : jake 011111 
Zw, Zwy j 
(2) (I2—Iu)?/bZ [wry] = .000130 
1 ——— 
=-- 1 2) = -000721 
oM 5 | )+(2) 7 


Standard error of Assay =57.3 (2.3026) ox¢ (Potency) = 19.7674 


% Miller, L. C.; Bliss, C. I.; and Braun, H. A., “The Assay of Digitalis,” Journal of the American 
Pharmaceutical Association, Vol. 28, pp. 644-657, 1939. 
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DISCUSSION 


Several assays of various kinds have been evaluated both by the 
probit method and by the angular transformation method. In all cases 
the agreement between the results has been very good. The Jonger pro- 
bit method seems to have no increased accuracy over the angular trans- 
formation method. 

Since it is assumed that the pharmacologist will compare only those 
unknowns which elicit the same type of behavior in the animals as the 
standard (i.e., we assume slopes of the dosage-response curves for the 
standard and unknown will be essentially the same), no test for de- 
parture from parallelism is included. Any difference in slope will be 
due to animal variation and will usually increase the size of the stand- 
ard error of the essay. 

Approximate confidence limits will be given by the use of the for- 
mula for the standard error of the assay. Fieller* and Irwin™ have given 
methods for obtaining more exact limits. Bliss** has given a summary of 
these methods which are primarily for use when the slope does not dif- 
fer significantly from zero. In a particular assay this can be avoided by 
setting lower limits on W such that the slope will be significantly dif- 
ferent from zero, and specifying that if W is less than this specified 
value, the assay should be repeated. So far, the experience in this 
laboratory seems to indicate that the formulas given here underesti- 
mate rather than overestimate the standard error of an assay. 

The “error of the assay” is chosen as term in contrast to the “error 
of the potency.” This is because of the distinction desired between two 
types of errors, namely, (1) how closely an assayist can check himself, 
and (2) how closely he can be checked by another assayist at another 
laboratory. It is clear that the error of the assay as calculated from the 
data of a single assay does not include all the factors that might cause 
variation. A measurement which does include all of these causes might 
be labeled the error of the potency and an estimate of this could be ob- 
tained from a collaborative assay participated in by several labora- 
tories. Of course it is possible that in some instances the factors causing 
variation between laboratories may have little influence on the size of 
the error, but this cannot be assumed until it has been shown that re- 

% Fieller, E. C., “The Biological Standardization of Insulin,” Journal of the Royal Statistical 
Society, Supplement Vol. 7, p. 49, 1940. 

% Irwin, J. O., “On the Calculation of the Error of Biological Assays,” Journal of Hygiene, Vol. 43: 


121-128, 1943. 
% Bliss, C. I., “Confidence Limits for Biological Assays,” Biometrics Bulletin, 1: 57-65, 1945. 
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sults from several laboratories exhibit the same statistical control as 
the results at any one laboratory. 


SUMMARY 


A simplified method is being given for evaluating biological assays 
having quantal responses. The use of the angular transformation is 
advocated to make the weights used proportional to the number of 
animals on that particular dose. It is shown that if an equal number 
of animals are used on each dose of standard and unknown of a two- 


high dose 


2s constant 
low dose 


dose assay and a constant ratio of doses (: =log 


a graph for determining potency and a nomograph for estimating the 
standard error of the assay can be constructed for use in the laboratory. 

A comparison of results calculated by the probit method with those 
calculated by the angular transformation method shows that the two 
methods give practically the same results but that the amount of time 
required to make the calculations on a two-dose assay by the two meth- 
ods is very different. The probit method calculations using the exact 
weights require about twelve times as long as those for the angular 
transformation method. 
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THE PROBLEM OF PLOT SIZE IN LARGE-SCALE 
YIELD SURVEYS 


P, V. Suxuatme, Px.D., D.Sc. 
Statistical Adviser 
Imperial Council of Agricultural Research, New Delhi 


This paper gives the results of the three investigations car- 
ried out on a district-wide scale for comparing the officiency 
of small size plots with those of the order of 1/80th of an acre 
used under the existing official procedure in India. The first 
investigation was on wheat in the United Provinces, and the 
other two on paddy (rice) in the provinces of Bihar and 
Madras. The experiments were conducted by the staff of the 
departments of Revenue and Agriculture posted in the dis- 
tricts, who ordinarily carry out crop-cutting experiments un- 
der the official orders. 

The results of all the three investigations show that there 
is a definite risk of obtaining over-estimates of the average 
yield with small size plots. In contrast, plots of the order of 
1/80th of an acre appear to be free from bias. The results 
also show that small plots fail to furnish unbiased estimates 
of the different components of the true variance. 


NDER the existing official procedure crop-cutting experiments 
U everywhere in India are carried out on sampling units (hereafter 
called plots) of large size; e.g. the plot size used in Madras is 1/100th 
of an acre (50 links X20 links), that in Bombay is 1/40th of an acre 
(33’ X33’), that in the Central Provinces is 1/10th of an acre (66’ X66’), 
that in Orissa is 1/160th of an acre (25 links X25 links), and so on. The 
plots are marked with the help of chains and pegs. All the experimental 
yield surveys which the Imperial Council of Agricultural Research 
carried out during the last three years in different provinces all over 
India on the principal food-crops, wheat and paddy, have also been 
carried out with plots of large size varying from 1/160th of an acre to 
1/20th of an acre, and in experiments on paddy in Madras Province 
the whole of the field was harvested [1, 2]. In the experimental surveys 
on cotton conducted by the Indian Central Cotton Committee the plot 
size used was also large, viz. 1/10th of an acre [3, 4, 5]. 

In contrast, workers in England and the U. S. and some in India 
have used very small plot sizes. Thus Cochran in England had used a 
plot size six rows wide each quarter of a metre long [6]. King, McCarty 
and McPeek used plot sizes from 1/4000th to 1/5000th of an acre [7]. 
In India a small sized plot of area 13.6 sq. ft. (1/3200th of an acre) 
was first used by Hubback [8] and in recent years by Mahalanobis [9]. 
These plots are all marked with the help of rigid or semi-rigid frames. 
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The reasons for adopting the existing large size plot in surveys con- 
ducted by the Imperial Council of Agricultural Research are as follows, 
It was considered that the principal reform in the existing official pro- 
cedure of conducting crop-cutting experiments lay in replacing the per- 
sonal selection of fields by an objective procedure based on the method 
of random sampling, and that if previous attempts by Hubback and 
others had failed to make a lasting impression on the method of select- 
ing the field, it was because these experiments had failed to demon- 
strate the practicability of the random sampling method for adoption 
by existing departmental agencies. It was, therefore, considered desir- 
able that in any scheme for the improvement of yield statistics, only 
such minimum changes should be made in the initial stages as would 
guarantee the unbiased character of the yield estimates. As the size of 
plot is unrelated to the question of randomization, it was kept un- 
changed. 

The progress of these surveys soon demonstrated that it is practica- 
ble to use the method of random sampling for conducting crop-cutting 
experiments and that the work can be taken up on a permanent basis as 
departmental routine without heavy additional expenditure. It was, 
however, considered desirable that the scope, if any, for plots of very 
small size should be investigated in view of the fact that such small 
plots were adopted by other workers. A summary containing the re- 
sults of these investigations has already appeared in Nature [10, 11]. 
It is the object of this paper to present the detailed results. 


INVESTIGATION ON WHEAT IN MORADABAD 


The first investigation was carried out in 1944—45 in the district of 
Moradabad in the United Provinces. The district has an area 2,268 sq. 
miles. It is divided into six divisions, called tehsils, all of which, except 
Thakurdwara, have an appreciable area under both irrigated and un- 
irrigated wheat. In Thakurdwara, however, almost the whole of the 
area under wheat is unirrigated. The plan of sampling consisted in se- 
lecting 8 villages from each tehsil, four for experiments on irrigated 
wheat and four for experiments on unirrigated wheat, except in Tha- 
kurdwara where all the 8 villages were selected for experiments on un- 
irrigated wheat. In each of the first five tehsils, sampling was done sepa- 
rately for irrigated and unirrigated wheat from amongst all the villages 
in the tehsil. In each selected village, two pure wheat-growing fields 
were selected and in each selected field 8 plots were marked at random: 
(a) two equilateral triangular plots of side 33’ subdivided into three 
strips by means of two lines parallel to the base drawn at distances of 
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81’ and 163’ from the vertex along the sides, as shown in the diagram; 
(b) three circular plots of radius 2’; and (c) three circular plots of 
radius 3’. The two fields selected in each village were irrigated or un- 
irrigated according as the village was selected for experiments on irri- 
gated or unirrigated wheat and were selected at random from amongst 
all the irrigated or unirrigated fields in the selected village. 
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The two triangular plots of side 33’ were marked with the help of 
tapes and pegs and were divided into three strips by means of chains 
placed parallel to the base at distances of 8}’ and 163’ from the vertex 
along the two sides. In harvesting the produce, the three strips were 
harvested in the order starting from the vertex to the base. The circular 
plots were marked with the help of a specially devised apparatus con- 
sisting of a rotating peg, a steel tape and a plumb line. The peg was 
made of wood and was provided with an iron collar at one end and a 
point at the other. It was fixed at a point in the field located by means 
of a pair of random numbers. The steel tape was so fixed to the peg as 
to revolve freely round the centre of the top of the peg. As the tape was 
revolved, the crop was cut from below the level of the tape, making 
room for the tape to move further until the original starting point was 
reached. To avoid trampling, the point located by means of a pair of 
random numbers was not taken as the center of the circle, but was taken 
as a point on its circumference on a line parallel to the length of the 
field. On arriving at this point the worker was asked to cut the crop 
from this point along the direction of the length until he reached a dis- 
tance a little more than the radius from this point. From the starting 
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point the worker then measured exactly a distance equal to the radius 
along the direction of the length of the field and fixed the peg at this 
point. This was the centre of the circle. 

The experiments were carried out by the staff of the Department of 
Revenue posted in the district who ordinarily are required to carry 
out these experiments under official orders. Before commencing the 
work, the entire field staff were gathered at Moradabad where the plan 
of work, the duties and responsibilities of each member of the staff, 
the meaning and implications of the instructions drawn from the con- 
duct of field work, the manner in which the returns were to be filled 
in and despatched, were explained and discussed until the workers ob- 
tained a clear idea of the entire procedure. The training was compl+ted 
with practical exercises in the fields in villages around Moradanad. 
This was not all. Six members of the staff of the Statistical Section at 
New Delhi were sent to assist the field staff in initiating the work. In 
addition, the entire work was supervised by touring by the Senior 
Supervisor Kanungo, the Officer-in-charge of the district and the statis- 
tical staff from the Centre. 

As explained in an earlier paragraph, a sample of 48 villages was 
selected for the investigation. Out of this total, experiments were per- 
formed in all except one village of the unirrigated group in the Amroha 
tehsil. In one village belonging to the irrigated group in Bilari tehsil, 
experiments were conducted in only one field as the crop in the other 
field was not ripe at the time of visit and was harvested before the 
worker could visit it again. In one field in a village in Thakurdwara 
tehsil, only six plots were gathered. Altogether, out of a total of 768 
plots proposed to be harvested under the survey, 742 were harvested. 

Table 1 shows the average estimated yield in pounds per acre and the 
percentage over-estimation for plots of different sizes taking the yield 
of the 33’ triangle as the standard. The average yield for the district 
was calculated by combining the tehsil averages in proportion to the 
area under wheat in the different tehsils. It will be seen that the esti- 
mate of the average yield for the district decreases as the plot size in- 
creases. The rate of decrease diminishes with increase in the size of plot, 
which suggests that when the plot size is sufficiently large, the estimate 
attains a stable value. The results are consistent for both irrigated and 
unirrigated wheat. They show that small plots less than 30 sq. ft. re- 
sult in serious over-estimation of yield, but even as large a plot as 117.9 
sq. ft. in area is not free from bias. It will be seen that the latter has 
actually given an over-estimate by 4.8 and 11 per cent for irrigated 
and unirrigated wheat respectively. 
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Table 2 gives, for each separate tehsil, the average estimated yield 
in pounds per acre for plots of different sizes. The general tendency to 
overestimate with smaller size plots is demonstrated in most tehsils 
both in the case of irrigated and unirrigated wheat. 

The bias observed and particularly the magnitude thereof were un- 
expected results of this investigation. The only instance showing bias 
in crop estimation from the use of small size plots is given by Yates 
[12]. He compares the results of experiments with a circular hoop of 
10 sq. ft. in area with those of sample plots of 1/20th of an acre in size 


TABLE 1 (MORADABAD) 


AVERAGE YIELD OF WHEAT IN POUNDS PER ACRE WITH PERCENTAGE 
OVER-ESTIMATION FOR PLOTS OF DIFFERENT SIZES 
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Irrigated Wheat Unirrigated Wheat 
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Size | Area Av. Av. : . 

and | of plot - Yield St. | Per- = Yield St. Per- 
shape | in No. in Error | centage | No. We | Error | centage 
of plot sq. ft. of pounds of ~~ | of pounds | of | over- 
plots per average | estima- | plots per average | estima- 

| yield | tion 2 yield tion 

acre | | acre 

33’ A | 471.55 | 78 | 831.1 | 77.3 | 107 | 539.0 | 60.9 am 
1634’ AA 117.89 | 78 | 870.6 81.5 4.8 | 107 | 598.2 67.5 | 11.0 
si A | 29.47 | 78 | 961.9 | 101.2 | 15.7 | 107 | 664.9 | 68.3 | 23.4 
3’ © | 28.29 | 117 | 954.5 | 90.5 | 14.9 162 | 618.8 | 64.2 | 14.8 
2? © 12.57 117 1183.3 93.8 | 42.4 161 | 767 .7 84.7 | 42.4 





and the field as a whole, and finds a very considerable bias in the use 
of circular hoops. The bulk of the bias, according to him, is due to the 
tendency to cast the hoop on the good parts of the crop. 

In this particular investigation, over-estimation for plots of small 
size can be reasonably ascribed to the tendency on the part of the staff 
to include border plants inside the plot area. Although the instruction 
given to the field staff was that only such plants as are within the plot 
should be harvested, they clearly seem to have experienced a difficulty 
in deciding whether to include a border plant inside the plot or exclude 
it, because a plant has several tillers and a fairly wide stem at the base 
on the ground. In small plots, the contribution of the border plants to 
the harvested produce is appreciable. It is therefore to be expected that 
the inclusion or exclusion of even a few plants should influence the re- 
sults materially in the case of small size plots as compared with large 
ones. 

TEST OF THE BIAS 


An appropriate test of bias is to compare the results of the different 
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sizes of plots with those obtained from harvesting the whole field. Ac- 
cordingly it was intended to harvest the whole field, but the idea had 
to be abandoned owing to the large expenditure and effort involved. 
However, in previous investigations conducted by the author the aver- 
age yield as estimated from 1/100th acre plots was compared with that 
obtained from harvesting the whole field. These are reported in (2). 
The difference observed between the two estimates was only 30 Ibs. 
as against a standard error of 158 Ibs. Yates obtained similar re- 
sults [12]. There is, therefore, justification for assuming that the yield 
obtained from the 33’ triangle represents an unbiased estimate of the 
average yield per acre. 

In Table 3 are shown the results of the statistical comparison of the 
estimates of average yield obtained from plots of different sizes. The 
tests were made as follows. All the plot yields were first converted to 
a per acre basis. The difference in the average estimated yield for com- 
parisons I and II was calculated by averaging the differences between 
corresponding plots over all the 78 pairs in the case of irrigated wheat 
and all the 108 pairs in the case of unirrigated wheat. In the case of 
comparisons III and IV, however, where there is no such correspond- 
ence between the plots, the converted plot yields were averaged out for 
each size of plot for each field, and the difference between such averages 
for each field was averaged out over all the 39 fields in the case of 
irrigated wheat and all the 54 fields in the case of unirrigated wheat. 
The standard error of the difference shown in columns 3 and 7 was cal- 
culated directly from the individual differences from which the averages 
were obtained, without analysing them as between tehsils, between 
villages and between fields. 

It will be seen that the 163’ triangle gives a significantly higher 
estimate of the average yield per acre for unirrigated wheat than that 
given by the 33’ triangle. Even in the case of irrigated wheat the differ- 
ence approaches the five per cent level of significance. An 8}’ triangle 
gives a significantly higher estimate of yield than a 16}’ triangle both 
in the case of irrigated and unirritated wheat. These comparisons con- 
clusively show that over-estimation with decreasing size of plots cannot 
be explained by chance errors, but must be ascribed to the possible 
tendency to include the border plants inside the sample plot. Compari- 
son No. III is of special interest, being a comparison of the results 
from two plots of different shape of about the same size. The difference 
in the average yield as obtained from a triangular plot and a circular 
plot of the same size is not significant. The results tend to show that 
size, more than shape, predominantly determines the estimate of yield, 
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although, as explained earlier for a given size, a circular plot, which 
has the minimum perimeter, should be preferable. 

ANALYSIS OF VARIANCE 


Table 4 gives the results of the analysis of variance of plot yields for 
the different sizes of plots. It gives estimates of the mean square be- 
tween villages, between fields within villages and between plots within 


TABLE 4 
ANALYSIS OF VARIANCE OF PLOT YIELDS IN (10 LBS./ACRE)? 

















| 33° 164’ A 83’ °O 2’°O 
| dt | at | | ag | d.f df 
Irrigated | | | | 
Mean square bet. villages | 15 | 3782 15 | 4170 15 | 6469 15 7840 15 8377 
Vv | 642 | 505 | | 924 | 683 684 
Mean square bet. fields | 19 | 1288} 19 | 2208] 19 | 2878] 19 | 3743| 19 | 4388 
F | | 405 | 686 | 984 992 981 
Mean square bet. plots 39 | 479| 39 | 834| 39 | 909| 78 | 766/| 78 | 1445 
P 479 | | 834 | 909 | | 766 | 1445 
Unirrigated | | 
Mean square bet. villages 21 | 3276) 21 4081 | 21 | 4176 21 5492 21 9688 
Vv 592 744 | 780 529 1150 
Mean square bet. fields 27 931 27 1130 | 27 | 1086 27 2324 | 27 2836 
F | 308 | 390 | | 182 | 561 | 610 
Mean square bet. plots | 53 | 321 | 53 360 | 53 | 727 | 108 | 643 | 107 1021 
P | 321 | | 360 | 727 | | 643 1021 





fields, as also estimates of the V, F and P components of variance de- 
rived from the mean squares by the usual formulae [6]. The results 
have been expressed on a pound per acre basis to facilitate comparison 
between the different sizes of plots. It will be seen that P increases as 
the plot size decreases, as is to be expected, but even the values of V 
and F components, which are interpreted as measuring respectively the 
village to village and the field to field variations and which are inde- 
pendent of the plot size, change with the size of plot. Thus for irrigated 
wheat, F shows a steady increase from the 33’ triangle to the 8’ tri- 
angle, thereafter apparently remaining constant. For unirrigated wheat 
F shows a somewhat irregular behaviour with change in the size of 
plot, but increases on the whole as the plot size decreases. V is rela- 
tively more steady but assumes an unusually large value for the 2’ 
circle in the case of unirrigated wheat. The variation in V and F is 
more than can be expected on the grounds of sampling error. It would 
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therefore appear that small-size plots suffer not only from a constant 
bias but also from a variable bias from field to field which results in 
inflated values of V and F.* Small-size plots thus fail to furnish un- 
biased estimates of both average yield and the components V and F. 


INVESTIGATION ON PADDY IN GAYA 


As the above finding was totally unexpected, and as one can always 
argue that bias might have been prevented by better field work, further 
investigations were carried out on other crops and in other provinces. 
One of these was carried out on paddy (rice) in Gaya district in Bihar, 
where Hubback carried out his first series of experiments by the random 
sampling method. Gaya district has a geographical area of 4,766 sq. 
miles, of which approximately 35% is under paddy, and is divided into 
4 subdivisions. The plan of sampling consisted in selecting at random 
108 villages, 36 from the subdivision of Gaya and 24 from each of the 
other three subdivisions. In each selected village two paddy-growing 
fields were selected ana in each selected field five plots were marked at 
random: (a) one rectangle of size 33’ K 163’ (area: 544.5 sq. ft.); (b) two 
isosceles right-angled triangles with the equal sides each equal to 5’ 
(area 12.5 sq. ft.); and (c) two equilateral triangles of side 15 links 
(area: 42.5 sq. ft.). In addition it was proposed to harvest the whole of 
the field, but this was dropped on the advice of the provincial govern- 
ment who apprehended that in view of the levy scheme in progress the 
intentions in harvesting the whole field were likely to be misunder- 
stood by the cultivators and might result in non-cooperation on their 
part. The plots (a) and (c) were marked with the help of tapes and 
pegs, but the plot (b) was marked with the portable apparatus devised 
and used by Mahalanobis in the previous year in this province [9]. All 
the plots in the field were located by means of pairs of random numbers, 
but were marked and harvested in the order starting from the southwest 
corner of the field. The experiments were carried out by the Circle 
Officers posted in the district in connection with the scheme of com- 
plete field to field enumeration of acreage under the crops. 

Before the commencement of work, all the Circle Officers were gath- 
ered at the District Headquarters and were trained in the conduct of 
the field work. The field work was supervised by the District Supervis- 
ing Officer and the Superintendent of Statistics, Bihar. The author also 
inspected the work in one village in Gaya subdivision. 

Out of a total of 108 villages selected for the investigation experiments 
were carried out in all villages except one in the Gaya subdivision and 


* This explanation has been suggested to us by W. G. Cochran. 
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three villages in the Nawadha subdivision. In two villages of the latter 
subdivision experiments were carried out in only one field in place of 
two as the crop in the other field was harvested by the cultivator before 
the date fixed for the purpose. Altogether, out of a total of 1,080 plots 
proposed to be harvested under the scheme, 1,030 were harvested. 
Table 5 shows estimates of the average yield in pounds per acre from 
plots of different sizes. The results are shown separately for each sub- 
division and for the district as a whole. The latter were calculated by 
combining the subdivision estimates in proportion to the area under 
paddy in the different subdivisions. The yield estimates decrease as 
the plot size increases. The table shows that the yield estimate derived 
from plots of 12.5 sq. ft. is an over-estimate by 23 per cent. Even plots 
of 42.4 sq. ft. give an over-estimate of nearly 9 per cent. In both cases 


TABLE 6 
ANALYSIS OF VARIANCE OF PLOT YIELDS IN (10 LBS./ACRE)* 








Size and shape Mean square Mean square F Mean square p 
of plot bet. villages bet. fields bet. plots 





I. Rectangle 33’ X 164’ 4922 (100)* 1706 1544 (102) 1544 _ _ 


II. Equilateral triangle 
of side 9.9’ 14529 (100) 2496 4641 (102) 2173 294 (206) 294 


III. Isosceles right-angled 
triangle equal sides 5’ 17378 (100) 3076 5193 (102) 2211 772 (206) 772 























* Figures in brackets give the degrees of freedom. 


the yield estimates are found to be significantly higher than the one 
given by the rectangular plot. The tendency to over-estimate with 
12.5 sq. ft. is seen to be consistent in all subdivisions. The result con- 
firms the results of the Moradabad investigation that the use of small- 
size plots such as are marked by means of a portable apparatus, in the 
hands of the departmental staff who are ordinarily required to carry out 
these experiments, is attended with risk. The results must be considered 
to be remarkable as no less than 17 Officers worked independently of 
each other. 

Table 6 gives the estimates of the mean square between villages, be- 
tween fields within villages and between plots within fields, as also of 
the V, F, and P components derived therefrom. Again, V, F, and P all 
increase as the plot size decreases, in agreement with the result ob- 
served in the Moradabad investigation. 
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INVESTIGATION ON PADDY IN KISTNA 

It is not proposed to describe here the results of all the investigations 
that have been carried out to date, but the results of one investigation 
carried out in the Kistna district of the Madras province simultane- 
ously with the investigation in Gaya district described above, would 
appear to be of particular interest. For, unlike the investigations in 
Moradabad and Gaya, the whole of the field was harvested in Kistna. 
Another distinguishing feature of the Kistna investigation was that the 
field staff was drawn from the Department of Agriculture and were all 
agricultural graduates. The plan of sampling consisted in selecting a 
total of 36 villages, distributed equally among the six talukas selected 
for the survey. In each selected village, three fields were selected at 
random out of all the paddy-growing fields in the village, and within 
each field the following plots were marked: 

(a) One rectangle of 1/100th of an acre (50 links X20 links) which is 
the official plot size adopted in Madras province; 

(b) two circles of radius 3’ each; 

(ec) two circles of radius 2’ each; and 

(d) two equilateral triangles of side 5’ each. 

In addition, the whole of the remaining field was harvested. The gen- 
eral procedure was similar to that in previous schemes. 

The field staff were trained in advance and were asked to exercise 
all possible precautions in regard to border plants. 

Table 7 shows the results. There is a close agreement in all the 
talukas between the yield estimates obtained from the standard 
1/100th acre plots and those obtained from harvesting the whole field. 
In none of the talukas, nor in the district as a whole, is the difference 
more than can be explained by the sampling error, indicating that 
1/100th acre plots gave unbiased estimates of the yield per acre. This 
corroborates the result of the previous investigation on paddy in the 
Tanjore district of the province [2]. The yield estimates from small 
size plots are, however, considerably different from those obtained by 
harvesting the whole field in most talukas. The bias observed is not 
always positive (over-estimate), but in a few cases is negative (under- 
estimate). The fact to be noted is that the bias does not cancel out 
when the results of a large number of different investigators are pooled 
for the district. Table 8 shows the results of the statistical comparison 
of the district estimates of the average yield. The yield estimates from 
the rectangular plot of 1/100th of an acre are statistically in agreement 
with those obtained from the whole field, but the yield estimates ob- 
tained from the small-size plots are significantly different from the 
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latter. The results are thus in agreement with those observed in the 
investigations in Moradabad and Gaya. 

Table 9 gives estimates of the mean squares between villages, be- 
tween fields within villages, and between plots within fields, and of the 
V, F and P components of the variance. It shows that V, F and P all 


TABLE 8 


RESULTS OF THE COMPARISON OF ESTIMATES OF AVERAGE 
YIELDS FOR DIFFERENT SIZES OF PLOT 




















Difference in Standard Degree 

average yield error of of = 

in Ibs. /acre difference freedom 
Rectangle Vs whole fieid 10.25 15.34 107 .67 
Rectangle Vs 3’ circle — 74.30 34.29 107 2.17** 
Rectangle Vs 2’ circle —175.94 73.22 107 2.40** 
Rectangle Vs 5’ triangle —492.37 93.25 107 | 5.28** 








Bhakan /6.xii.46. 


increase as the plot size decreases. The results are thus similar to those 
observed in the case of the Moradabad and Gaya investigations and 
show that small size plots do not give unbiased estimates of the true 
variance between villages and between fields within villages. 


TABLE 9 


























Whole 50 X20 ‘ ‘ } ’ 
. 3’O 20 | BA 
field | (links)? | 

at. | | a. | | a. | | at. | | as. | 

| | | | 
Mean square bet. villages 30 |11756 | 30 111545 30 132340 | 30 |60050 30 |53449 
Vv | 3272 3207 | 4321 | | 8770 | | 7297 
Mean square bet. fields 72 | 1940 72 1924 72 6414 72 7432 | 72 | 9666 
F | 1940 1924 3190 | | 3634 | 4639 
Mean square bet. plots 108 34 ; 108 164 | 108 387 
P | | 34 | 164 | 387 








To conclude, the results of the investigation show that small size 
plots such as are marked by a portable frame and whose produce can 
be handled by the experimenter himself without the help of hired labour 
cannot be depended upon to give unbiased yield estimates. The risk 
is particularly serious where, as in India, crops are unevenly sown and 
experiments have to be carried out by the staff of the Departments of 
Revenue and Agriculture in the course of their normal duties. In con- 
trast, large plots of the order of 1/80th of an acre appear to be free from 
bias. 
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IRVING FISHER 1867-1947 


ROFESSOR IRVING FisHErR, President of the American Statistical 
Association in 1932, died on April 29, 1947. Happily, the Associa- 
tion had given a dinner in his honor at Atlantic City during the last 
annual meeting. The tribute paid to him by Professor Frisch was most 
felicitous, and the presence of so many members of the Association was 
a convincing testimonial. In his response, Professor Fisher gave a 
demonstration that his keenness and wit were still unimpaired. His life 
was one of continuous contribution to many fields, and of encourage- 
ment and stimulation to others to do likewise. His influence remains in 
the minds and purposes of all of us. 
Wituarp L. THorp 
President 
Washington, D. C. 
June 1947 
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BOOK REVIEWS 


Edited by 
OscarR KrisEN Buros 
Rutgers University 


De Gedachtengang van de Statistica. [Methodology of Statistics.] S. 7. Bok 
(Head, Statistical Department of the Institute for Preventive Medicine, Leiden, 
Netherlands). Proceedings of the Institute for Preventive Medicine, No. 3. 
Leiden, Netherlands: H. E. Stenfert Kroese Uitgivers-Maatschappij N.V. (14 
Breestraat), 1946. Pp. vii, 155. 


REVIEW BY GERHARD TINTNER 
Professor of Economics and Mathematics 
Iowa State College 


HIs is a short introduction into statistical methods for medical students. 

It deals very competently with some of the fundamental ideas and pro- 
cedures of modern statistics. The point of view is decidedly Fisherian. The 
examples are taken from medicine and biology. 

No proofs of the various mathematical theorems are given. No tables are 
provided but some useful nomographs which, however, do not seem to give 
as accurate results as the usual tables. 

This book compares very favorably with similar textbooks of medical 
statistics published in England and America in spite of the fact that it is 
much shorter than most of these. The emphasis on methodology is particu- 
larly commendable. 

One slight criticism seems to be in order. The almost universally accepted 
symbol for the criterion of goodness of fit in the literature is x?. It is here 
denoted by G*. This seems an unnecessary innovation which will make it 
more difficult for Dutch students of statistics to follow the literature pub- 
lished in the English language. The lack of uniformity of notation in statistics 
has been widely deplored, and there seems to be no good reason to compli- 
cate matters further by adopting new symbols. 

Another more serious shortcoming is that no literature is quoted at all. 
No books are indicated which the student should read after having mastered 
this introduction. 


Examination of Industrial Measurements. John W. Dudley, Jr. New York 18: 
McGraw-Hill Book Co., Inc. (330 West 42nd St.), 1946. Pp. ix, 113. $2.00. 
Two reviews follow: 


Review sy C. West CouRCHMAN 
Assistant Professor of Philosophy 
University of Pennsylvania 


{ie book provides in very simple language a statistical manual for the 
product control engineer. It contains a very elementary discussion of the 
mathematical theory underlying the application of control charts, and a con- 
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sideration of a test of difference of means, a test based on runs, and a brief 
discussion of curve fitting. 

Throughout the language of the little volume is designed for the engineer 
who wishes to be “eased” into his mathematical thinking. The development 
is gradual, and the points are extremely well illustrated. 

Since the book is designed for very widespread use among persons un- 
familiar with statistical methods, it suffers considerably from loose or un- 
qualified terminology and symbolism, apt to deceive the unwary into an il- 
logical application of the methods. For example, it is certainly unwise to 
take the “betting odds” for limits recommended by A.S.T.M. and A.S.A. 
as “standards” (p. 6) since the odds differ so widely depending on the 
product. The specification for a sample size between 300 to 600 for making 
“reliable estimates” (p. 11) for large lots is also arbitrary. The discussion of 
randomness (pp. 17-18) is far too brief for any real understanding, and is 
better eliminated or else replaced by a more careful analysis of the meaning 
of the randomness presupposition relative to the application of the methods. 

Perhaps the greatest flaw, however, occurs in using the same symbol for 
sample estimates and for the true values of the parameters of a population. 
Thus, the empirical formulas for X and o (pp. 23-4) are incompatible with 
the formula for the normal curve (p. 32), and the formulas for its areas 
(p. 36). So often, especially in quality control manuals, an appeal is made to 
“large sample theory” as an excuse for a failure to distinguish between esti- 
mates and true values, as though there were a different logic assoc.ated with 
large samples, from the logic we associate with small samples. The careless- 
ness of terminology may be less important for large samples, but this point 
can scarcely be established without some rigorous analysis to begin with. 
And it is not difficult to discuss the nature of the true parameters of a popula- 
tion independent of the calculus. The failure to make the mathematical 
model explicit is apt to lead the layman into a very serious misconception of 
the underlying logic of statistical methods in quality control. 

In sum, the book is an adequate summary for the engineer who does not 
plan to apply statistical methods, but merely wants to know what is “going 
on” when such methods are applied; it is a very poor introduction for the 
engineer who wishes to know the logical basis for the proper application of 
statistical methods in quality control. 


REVIEW BY JosEPH MANUELE, Director of Quality Control, and 
Roscoe Byrsrs, Quality Control Staff Assistant 
Westinghouse Electric Corporation, Pittsburgh, Pennsylvania 


- THIs short treatise, which can be read in one evening, the author, in his 
own words, “attempts to acquaint the engineer with some useful, simple, 
and adaptable statistical toois and to enable him to collect data in such a 
manner that they can be subjected to more elaborate analyses when such 
analyses are necessary and justified.” The result is that he succeeds in 
acquainting the reader with the standard source material on the various 
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aspects of statistical techniques. The bibliography given at the end of the 
book is complete, and the author makes reference to the various authors 
throughout his book. The author neither develops any new methods in 
statistics nor presents completely the methods which have been perfected 
by others. His general method is to present a problem, show the reader the 
general method of solution, and then refer him to the proper text in the 
bibliography, where a complete discussion may be found. 

The author makes a strong case, and justly so, for the benefits to be derived 
from the application of statistical methods, but the reader had better be 
prepared to do a considerable amount of collateral reading if he wishes to 
master this subject, or even become familiar with it. 

The author starts out by describing the two general methods of gathering 
information about manufactured products—namely by attributes, or frac- 
tion defective, and by variables, or direct measurement. 

For the use of variables the all important subject of order is well discussed 
by means of examples and the point is brought out that when order of oc- 
currence of measurements in relation to the process is preserved, it is very 
possible to discover assignable causes of variation. 

In addition to the condition of order, two other considerations are dis- 
cussed as being paramount in presenting and examining industrial measure- 
ment: (a) central tendency and (6) extent of variation or measure of disper- 
sion. This subject of central tendency and extent of variation could re- 
ceive more attention because of its importance. However, authors in 
general pass it off with a paragraph or two, leaving the reader to acquire 
understanding through experience. 

The text then treats, though somewhat lightly, the Shewhart average, 
range, standard deviation, and fraction defective control charts with some 
practical examples. The tables of factors for computing control chart lines 
has been reproduced from the A.S.7.M. Manual on Presentation of Data. 

A discussion of quartiles and their uses in estimation of dispersion is pre- 
sented for those who dislike computations necessitating squares and square 
roots. Also, the use of quartiles for data appreciably skewed is brought out 
with interest. 

In the discussion of the theory of sampling, neither probability nor risk 
is defined or mentioned in any manner. No presentation of the subject of 
sampling is complete without at least sounding a warning of the risks in- 
volved; the author would have done well to have mentioned them. 

The subject of runs is treated in an interesting manner, and a few rules-of- 
thumb are given which, if applied, will enable the engineer to detect and 
question suspiciously long runs. No distinction is made between “runs up” 
and “runs down.” The only distinction is between runs above the central 
value and runs below the central value. 

Curve fitting with lines of variation for control purposes is nicely pre- 
sented by the author. By following through in his treatment of curve fitting, 
to computing control lines of variation, the reader not only should gain a 
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means of finding assignable causes but also will have a simple check on the 
goodness of his curve fit. 

Although nothing new is developed, the text is easy reading for the begin- 
ner with good recommendations for further reading. It is a good book for 
the beginner to have around, for it tells him where to go to get the answers 
to his problems in the application of statistical methods. The bibliography 
lists the books which should be in the bookcase of anyone wishing to estab- 
lish a personal library of the best in statistical literature. 


Statistical Analysis for Students in Psychology and Education. Allen L. Edwards 
(Associate Professor of Psychology, University of Washington). New York 16: 
Rinehart and Co., Inc. (232 Madison Ave.), 1946. Pp. xviii, 360. $3.50. 


Review By J. H. Curtiss 
Assistant to the Director, National Bureau of Standards 
Washington, D. C. 


nis text makes a determined effort, somewhat in the spirit of the earlier 

book by Peters and Van Voorhis (Statistical Procedures and Their 
Mathematical Bases, 1940), to present the modern point of view in statistical 
methodology to students of the behavioral sciences. It requires of its audi- 
ence practically no mathematical attainments and is self-contained to the 
point of including a 15-page review of the rules of elementary arithmetic. 
These facts, and the increasing frequency with which references to Peters 
and Van Voorhis now appear in the literature, suggest that many teachers of 
psychology and education may wish to consider this text for classroom 
adoption. It seems well worth while therefore to supplement the critique of 
the book previously published in this review section by Professor David A. 
Grant with a number of further remarks on the organization and character 
of the text. 

After an introduction which gives a brief glimpse of the goals of statistical 
analysis, the previously mentioned review of arithmetic is given. There fol- 
lows about one hundred pages of straightforward exposition of classical de- 
scriptive statistics, concluding with two chapters on measures of bivariate 
association. From this point onward, the viewpoint is based entirely on 
modern sampling theory, except in the discussion of regression in the next 
to the last chapter. After an overly abbreviated discussion of probability 
and frequency distributions, the concept of sampling is introduced by refer- 
ring to the work of opinion-polling organizations. The distributions of the 
mean and of Student’s ¢ are discussed, fiducial limits are described, and 
various classical standard errors are catalogued. A more intensive study of 
the ¢ distribution is next undertaken via an experiment with paired observa- 
tions, and a good discussion is given of the advantages and disadvantages of 
pairing observations. Two chapters follow on the analysis of variance; the 
second studies the application to several matched groups. The usual test 
for linearity of regression is thrown in here, although it would seem that 
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this material should be relegated to the chapter on regression. The usual ap- 
plications of x? are next discussed. Then comes the chapter on regression; 
here the purely descriptive approach is something of a letdown. The book 
concludes with a set of comments on the general problem of applying the 
models of analytical statistics to the experimental situations encountered in 
the social sciences. 

The text is written in an informal style with frequent use of the second 
person. (“Perhaps you are wondering what a set of scores yielding a very 
small correlation coefficient would look like when plotted.”) The author 
seems to have followed a well-thought-out plan of graduating the difficulty 
of concept and rigor as the book progresses. Thus on page 83, he permits 
himself to say, “When the relationship between two variables is not perfect, 
then we have two regression lines,” but more careful language is used later 
when similar material is discussed. There is an abundance of interesting 
exercises for the student. Tables of squares, random numbers, the normal 
curve, t, r, F, and é are given at the end of the book where they can easily 
be found. 

This reviewer can offer only a lay opinion on the author’s choice of topics. 
There are certainly some rather spectacular omissions, such as the failure 
to cover the problem of the reliability and validity of tests; these will 
surely be critized adversely by teachers in the behavioral sciences. Professor 
Grant’s review makes some mention of such matters. The author points out 
in his preface that he is quite aware of the eccentricities in his program. It 
nevertheless seems to this reviewer in the light of experience in other fields 
that the author has written an up-to-date text at the level at which he set 
out to work and has done a good job of organizing the material. 

The shortcomings of the book seem to fall into two categories: those which 
arise from the mathematical deficiencies of the intended audience and those 
which arise from an uncritical use of the available reference literature. 

The author’s usual method of exposition consists in presenting a formula 
in full-fledged mathematical notation (but without proofs) and in then fol- 
lowing up with excruciatingly detailed demonstrations of how to substitute 
into the formula. Occasionally, also, a simple algebraic development is given, 
always in very easy steps. Although Professor Grant, in his review, expresses 
keen disappointment at finding so much “busywork” in the text (and indeed, 
the author, using this word in the preface, openly promises there to avoid 
such goings on), nevertheless this method of pedagogy will find supporters, 
especially in the physical sciences. Many a research worker in physics has 
sat in his laboratory performing low-grade mathematical calculations with 
pencil and paper while his brain successfully probed the real inner meaning 
of his experiment. 

It seems less excusable that the author, in his attempt to meet the limita- 
tions of his readers, fails to give enough elementary distribution theory to 
enable him really to put across any of the descriptive concepts connected 
with continuous distributions. The thought seems to be present in his mind 
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that the meaning of a continuous distribution will be clear to the student 
from outside sources of information. In any case, the “normal distribution,” 
identified by that name, suddenly appears out of the blue sky on pages 39 
and 40 in the midst of very elementary talk about the mean and variance of a 
set of numbers, before frequency distributions have been defined. Indeed, 
there is some room for doubt as to whether the author himself is entirely 
clear about the meaning of a distribution curve, because on page 141 he an- 
nounces twice that the ordinate of the normal curve equals the frequency at 
the corresponding point on the “base line,” and his method of curve fitting 
(p. 143) seems somehow to involve fitting ordinates rather than areas. 

But perhaps he got that idea from the literature on which he relies so 
heavily, because the concept appears on pages 299, 301, and 367 of Peters 
and Van Voorhis. This reviewer found the rather abject dependence of the 
author on the reference literature, and in particular on secondary sources 
such as Peters and Van Voorhis, most annoying. No less than 102 references 
are listed in the course of the book. From time to time, apparently quite 
trivial assertions are supported by references; take, for example, this thought 
for the week, which appears on page 277: “To be sure, as Lynd (66) has 
stated, not all questions and consequently not all research problems are of 
equal significance.” (Reference (66) is to R. S. Lynd’s Knowledge for What? 
Princeton University Press, 1939.) It is probably too much to expect that 
the author of a text on statistical methods in the behavioral sciences should 
be a highly trained mathematical statistician, but if such an author chooses 
to proceed in a completely derivative manner, the state of the literature is 
such that he might do well to seek guidance and criticism from an expert. 
It might be well to point out in this connection that Peters and Van Voorhis 
has been severely criticized on grounds of fact and rigor; the reader is re- 
ferred, e.g., to the review by F. F. Stephan in the American Mathematical 
Monthly for December 1941. In addition to the identification of ordinates 
with frequencies mentioned above, a number of the other errors in Peters 
and Van Voorhis pointed out by Stephan are carried over in this book. 

The following additional criticisms may be worth noting here: (a) The 
advice given on page 201 to the effect that a significant F test can be pur- 
sued by specific comparisons with the ¢ test needs careful qualification, (5) 
the additivity and independence assumptions inherent in the analysis of 
variance model are never clearly stated, and (c) no information is given in the 
text or tables as to whether the tabulated distributions of r, t, F, and é& are 
one-tail or two-tail tables, and no instructions are given for coping with 
negative r and e’. 

Perhaps these errors and omissions can be corrected in a second edition. 
The really important thing is that the author has made a serious and care- 
fully planned approach to the problem of presenting the modern point of 
view at a low level of mathematical sophistication. It is the opinion of the 
reviewer that the book should prove to be a successful classroom text, 
provided that the omission of certain subjects held in high esteem by teachers 
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of the behavioral sciences does not limit the usefulness of the text too much. 
From the statistical point of view, in spite of the blemishes pointed out 
above, it is as reliable as any of the currently popular textbooks in its field, 
and the mistakes are more than balanced by the progressive, forward-look- 
ing point of view. Moreover, quite proper emphasis is given to the im- 
portance of providing insight into the rationale of the statistical method. 


Statistics in Psychology and Education, Third Edition. Henry E. Garrett (Pro- 
fessor of Psychology, Columbia University). New York 3: Longmans, Green & 
Co., Inc. (55 Fifth Ave.), 1947. Pp. xiii, 465. $4.00. T'wo reviews follow: 


REVIEW BY JOSEPH F. Day 
Bureau of the Census, Washington, D. C. 


HOSE familiar with the earlier editions of this well-known text will find it 
Tsun quite recognizable in spite of its several revisions. As pointed out 
in the introduction, the book is intended for the use of those whose main 
interests lie in the subject matter of the social sciences but who need to 
know at least the rudiments of statistical inference in order to conduct their 
research. Consequently, the author confines his discussion almost entirely 
to those topics which have the widest range of applicability—to measures of 
central tendency, of dispersion, and of correlation; to methods of scaling 
distributions; and to elementary tests of statistical hypotheses. The book is 
neither a collection of formulae nor a series of closely knit arguments. It 
requires very little of the student in the way of mathematical background 
and yet maintains a reasonably successful compromise between readability 
and rigor. There are numerous problems (together in general with their 
answers) to illustrate the text. 

There are, of course, some faults in the fourteen chapters. For example, 
in Chapter 2, in spite of a helpful summary telling when to use the various 
measures of central tendency, one is left with the false impression that the 
median, mean, and mode are always estimates of the same quantity and 
that the mean is always the most reliable estimate of it. Again on page 70, 
there appears a statement which sounds a bit strange to a mathematical 
statistician: “This result [that the mean of the sample is 4.77], while mathe- 
matically correct, is rather difficult to interpret in a practical way, as it is 
obviously impossible for a family to have four and a fraction of children.” 
This reviewer believes that it would have been better to take the occasion 
to point out that a mean value is a property of a distribution and not in 
general an attribute of a single observation. In Chapter 5 the author has not 
been entirely successful in his presentation of the uses and limitations of the 
normal probability curve; for in spite of the obvious care which went into 
the preparation of this chapter, the untrained reader is likely to infer that 
normality and symmetry are equivalent. This is a chapter that must be 
read in its entirety (footnotes included) before conclusions are drawn from it. 
The discussion on sampling and reliability represents a step in the right 
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direction, but it seems unfortunate that more was not said about actual 
sampling techniques. The definition of random sampling on page 223 could 
easily have been made accurate with the aid of a short discussion on cluster 
sampling, stratified sampling, and quota sampling. Finally, the author has 
taken an apparently unnecessary liberty with standard terminology in 
Chapter 8, where on page 234 he confines the term “null hypothesis” to the 
ease in which the difference between means is assumed to be zero. 

Notwithstanding these defects, the book serves its intended purpose well 
and should be of definite value in helping the nonmathematician “to under- 
stand, or withstand, the statistics that are constantly thrown at him in print 
or conversation.” 


Review BY WALTER L. DEEMER, JR. 
Chief, Department of Statistics, AAF School of Aviation Medicine 
Randolph Field, Texas 


HE second edition of this book (1941) represented a rather complete re- 
T vision of the first edition (1926). The edition being reviewed is not as 
complete a revision. Some of the more important changes include the addi- 
tion of chapters entitled “Testing Experimental Hypotheses” and “Multiple 
Correlation in Test Selection,” and the deletion of Blakeman’s test for 
curvilinearity of regression and the substitution of a chi-square test. As in the 
earlier editions, each chapter is followed by problems. 

This book has been very widely used in elementary courses in educational 
and psychological statistics. The principal reason for the wide use of this 
text is probably the detailed, well-written exposition of many of the topics 
that are taken up in elementary courses. Many other topics which might be 
expected in an elementary text, however, either are not mentioned at all 
or are given only brief mention and little or no explanation. 

An elementary text should give a balanced treatment of all the important 
topics to be covered in a first course in statistics. It is hardly sufficient to add 
a few pages about new material to an old text. Whereas the first edition of 
this text was well balanced with respect to space allotted various subjects, 
the present edition has far too much space allotted to what might be called 
“the old statistics” and far too little devoted to developments of the last 
twenty years. 

There are dozens of textbooks in elementary statistics, many of them 
expressly designed for students of psychology and education. It seems rea- 
sonable to require a new book to justify itself either by presenting something 
new, or by gathering old material from a variety of sources, or by presenting 
old material in a better way. The book under review does not meet these 
criteria. 

As a matter of fact, it is somewhat difficult to see the justification for the 
new edition of this text. Earlier editions suffered because of neglect of analysis 
of variance and covariance and small sample theory in general; cumber- 
some formulas for partial and multiple correlation; neglect of the concept of 
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confidence intervals; emphasis on standard errors without consideration of 
sampling distribution; neglect of the logical foundations of tests of statistical] 
hypotheses; neglect of the concept of the power function of a statistical test. 
The third edition suffers from these same faults. True, some paragraphs on 
these subjects have been included, but the treatment is still either missing, 
inadequate, or actually misleading for the following topics: the t test; degrees 
of freedom; analysis of variance; analysis of covariance; tests of statistical 
hypotheses; confidence intervals; sampling methods; and the power func- 
tion of a statistical test. There is no description of how to present data in 
tables. No mention is made of factor analysis. The emphasis remains on 
standard errors without regard for the sampling distribution. For example, 
the standard error of the standard deviation is used for interpreting the 
statistical significance of an obtained standard deviation as if the standard 
deviation were normally distributed (though the anormality of the sampling 
distribution of r is properly noted by the author and the Fisher z transforma- 
tion is recommended). 

The analysis of variance is given an inadequate eleven pages. Analysis of 
covariance is not mentioned. 

Sheppard’s correction for grouping is not mentioned. Instead, there is the 
statement, “usually the error which results from grouping is so small that 
it may be neglected in ordinary statistical work.” 

The ¢ test is introduced with the statement, “In small samples, the normal 
curve no longer tells us accurately the probability of a divergence of our 
sample mean from the population mean. The sampling distribution to be 
used when N is small is not strictly normal; its ‘shoulders’ are higher than in 
the normal curve and the probability of extreme deviations somewhat 
greater.” In addition to the incorrect statement regarding the shoulders of 
the ¢ distribution being higher than the normal curve, an important feature 
of this passage is its failure to indicate to the student the essential fact that 
the ¢ distribution is the sampling distribution of a statistic divided by its 
estimated standard deviation under random sampling from a normal popula- 
tion. Surely this broader concept is not too difficult to put across to begin- 
ning statistics students. 

In an early chapter Garrett explains that the standard error formulas he 
gives are written in terms of population values. He then gives a formula for 
the standard error of the mean “cy =a/+/N (the standard error of the arith- 
metic mean when N is large).” The concept of large and small N is developed 
more fully a few pages further on, and the use of N —1 in the denominator 
when calculating the standard deviation is quite properly explained. But 
the issue becomes confused by Garrett’s suggestion that if N —1 is not used 
when computing the standard deviation, a later correction can be made 
when computing oy. Actually, of course, oy=a/+/N, whether the sample is 
large or small. 

The use of N—1 instead of N in computing the standard deviation has led 
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Garrett into a rather egregious error. On page 273 he computes the correla- 
tion coefficient for five cases using the formula 


1 E 7 
r=— o(=.*), 
N Oz Gy 


In a footnote he states, “These o’s were calculated by formula 


(-- z=) 


since the samples are small.” It is interesting to note that in the second edi- 
tion the same data are used, but the r is correctly calculated. 

In his discussion of elementary probability Garrett states, “since there 
are only two possible outcomes, a head or a tail is egually probable.” (The 
italics are Garrett’s.) This kind of statement is extremely unfortunate since 
the concepts of probability are sufficiently difficult without confusing the 
student in this way. Garrett would probably be very discouraged if his 
students used his logic kere to refute an earlier statement of his, “the prob- 
ability that the sky will fall is .00,” by arguing that there are only two pos- 
sible outcomes, either the sky will fall or it won’t, so the probability of each 
outcome is a half. 

The normal distribution is not quite as universal as Garrett indicates. 
Contrary to Garrett’s assertion, weight, wages, and reaction time are not 
normally distributed. The normal curve is very important, but the student 
is likely enough to consider too many things normally distributed, and he 
needs some warning against this, not overemphasis on the normal curve. 

In his section on “Measuring Divergence from Normality,” Garrett states, 
“It is important to know (1) whether the skewness which often occurs in 
distributions of test scores and other measures represents a real divergence 
from the normal form; or (2) whether such divergence is the result of chance 
fluctuations, arising from temporary causes, and is not significant to real dis- 
crepance.” Presumably Garrett is referring to sampling fluctuations which 
would give a sample which was not normal. But if so, what does he mean by 
the phrase “arising from temporary causes”? 

In the same section the value .009 is computed from the formula 

3(mean — median) 
Sk = ’ 


o 





and this value is called negligible. No criterion is given for determining when 
a value of Sk is negligible. Two paragraphs later appears the statement, 
“how much skewness a distribution must exhibit before it may be said to be 
significantly skewed cannot be answered until we have calculated a ‘standard 
error’ of our measure of skewness.” (Italics are Garrett’s.) It would seem 
logical to assume that “negligible” and “not significant” are two different 
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concepts. If so, the student has a right to expect an explanation of the term 
“negligible,” and a criterion for deciding when a number is negligible. 

An added criticism of this section on measuring divergence from normality 
must be made of Garrett’s omission of any statement concerning the use of 
third and fourth moments for measuring skewness and kurtosis. 

In his section on probability (p. 104), Garrett has defined the probability 
of an event as “the expected frequency of occurrence of this event among 
events of a like sort.” On page 136 he states, “Since 68.26% of the cases in the 
given distribution fall between 8 and 16, the chances are about 68 in 100 
that any score in the distribution will be found between these limits.” If 
probability is defined as Garrett has defined it, the chances (or probability) 
that any score will have a certain value has no meaning. Either the score has 
the value or it does not, and there is no “frequency of occurrence” possible. 
This is not so trivial a criticism as it might at first appear, since it is precisely 
confusion on such points as this which leads students into fallacious state- 
ments about “the probability of the true mean having a certain value.” This 
same error is made on page 188 where we find the statement, “Hence, the 
odds are 95:5‘or 19:1 that any sample mean will lie within these limits.” 
What Garrett probably meant was not “any sample mean,” but “the mean 
of a randomly drawn sample.” And occasionally, for example on page 194, 
Garrett makes the error of making a probability statement about a popula- 
tion value, “We may be confident (probability is .95) that the population ¢ 
is not larger than 2.61 nor smaller than 2.53 inches.” 

The treatment on page 220 of the standard error of a percentage shows a 
confusion regarding the use of population values in standard error formulas, 
though Garrett has earlier stated that the formulas are written in terms of 
population values. Garrett inserts obtained values of p and gq in the formula 
for the standard error of a percentage in order to test the hypothesis that 
two obtained values of p are equal. What he should have done, of course, 
was to use the p computed from both groups combined in his formula for 
the standard error, since under his null hypothesis this is his best estimate 
of population value. 

On page 224 there appears a series of misstatements; e.g., “in small sam- 
ples, wide deviations from the mean cannot appear if the sample is truly 
representative of a normally distributed group.” By “truly representative,” 
Garrett probably means that the sample has exactly the same proportion of 
large and small deviations as the parent population. It is true that large de- 
viations will occur in only a small proportion of samples, but surely the state- 
ment that they “cannot appear” is not justified. 

This same concept leads Garrett to say later, “When working with small 
samples, therefore, deviations far removed from the mean should be dis- 
carded much as a laboratory worker throws out measures of reaction time 
which are obviously premature or delayed.” The discarding of observations 
because they were far removed from the mean was at one time a fairly com- 
mon practice, but surely today the great majority of competent research 
workers would agree with Fisher, “the rejection of extreme values is often 
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advocated, and it may often happen that gross errors are thus rejected. Asa 
statistical measure, however, the rejection of observations is too crude to 
be defended.”! Since Garrett fails to state a criterion for deciding when a 
deviation is “far removed from the mean,” it is to be hoped that few students 
will throw away observations. On the other hand, many students may be 
tempted to throw away any extreme observations which do not agree with 
their preconceived ideas of the expected values of the observations. 

In the next paragraph we find the following, “If the means and sigmas 
computed from these two independently drawn groups are of almost the 
same size, we may feel reasonably sure that both samples are representative 
of the population.” This is followed by an elaboration of the same theme. 
It is hard to reconcile Garrett’s broad research experience with this state- 
ment. It is clear that if the first sample is biased, the second also may well 
be biased and that close correspondence between the two samples shows 
nothing whatsoever about the representativeness of either one. 

The heading of the section that immediately follows is: “Reliability For- 
mulas Assume A ‘Sufficiently Large’ Sample.” From the context it appears 
that Garrett is referring to standard error formulas. This heading is a sample 
of a series of erroneous statements which follow in this paragraph. 

There are a few trivial matters which might be mentioned. The range is 
defined as “the interval between the largest and smallest scores. The range 
is found by subtracting the smallest from the largest score.” This definition 
is quite common in statistical texts, but your reviewer believes that the 
range is the above difference plus 1. For example, if you have just one score, 
the range is 1, not 0. 

On page 7 appears the statement that the interval 140-145 is easier to 
handle arithmetically than 142-147; the reason for this statement is not 
given. 

The term “guessed” or “assumed” mean is used. The term “arbitrary 
origin” is preferred by the reviewer because it is less misleading to the be- 
ginning statistics student. 

The mean deviation is written as MD =| Dx|/N. This should be |2|/N. 
This error appears a number of times. 

Paradoxically, it is Garrett’s most admirable quality as a textbook writer 
that makes his errors so dangerous; he writes lucidly and at the level of the 
beginning student, so that his erroneous statements are very convincing. 

The treatment of reliability of mental tests is excellent. It is sufficiently 
complete for an elementary text and is accurately and concisely written. 

There seems to be a general tendency for elementary texts, whose authors 
pride themselves on being nonmathematical, to give rules and formulas 
without derivations or even reasons. When the mathematical derivation can- 
not be presented, formulas can be made much more reasonable to the stu- 
dent if some intuitive reasons are given for them. In general, however, Gar- 
rett does not give either mathematical or heuristic proofs. 

1R. A, Fisher, “On the Mathematical Foundations of Theoretical Statistics,” Phil. Trans. Roy. 
Soc., 222A, 1922, p. 322. 
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In his introduction (dated 1926) Professor Woodworth states that this 
book is intended for the statistician who is between the mathematician who 
devises instruments for handling statistical jobs and the skilled worker who 
uses calculating machines and computes the necessary statistics from 
formulas. This middleman “must have a discriminating knowledge of the kit 
of tools which the mathematician has handed him, as well as some skill in 
their actual use . . . [This book] lays out before [the middleman-statistician] 
the tools of the trade; it explains very fully and carefully the manner of 
handling each tool; it affords practice in the use of each. While it has little 
to say of the tool-maker’s art, it takes great pains to make clear the use and 
limitations of each tool.” 

The 1926 edition may have come pretty close to doing what Woodworth 
claimed, but since 1926 the tool makers have devised a great many new tools 
and Garrett certainly does not make clear the use and limitation of each of 
them. In some cases it appears that Garrett has done little more than heft the 
tool before writing about it. 


Introduction to Mathematical Statistics. Paul G. Hoel (Associate Professor of 
Mathematics, University of California at Los Angeles). New York 16: John 
Wiley & Sons, Inc, (440 Fourth Ave.), 1947. Pp. x, 258. $3.50. (London W.C. 2: 
Chapman & Hall, Ltd. [37-39 Essex St., Strand], 1947. 21s.) Two reviews follow: 


Review sy J. H. Curtiss 
Assistant to the Director, National Bureau of Standards 
Washington, D.C. 


H1s book is intended to serve as a textbook for a two-semester course in 
pip mathematical statistics with an elementary calculus prerequisite. 
It thus enters a wide-open field, containing no serious American competitors 
except the two-volume work by Kenney, published several years ago. Ken- 
ney never found favor among the more sophisticated mathematicians and 
mathematical statisticians who have been teaching the intermediate course, 
so a new text by a well-known young mathematical statistician was almost 
certain to receive a warm welcome if the job was done at all competently. 

After a careful reading of Hoel’s book, the reviewer is willing to state that, 
in view of its general excellence, its publication is rather more than just a 
welcome event: it is the most important happening in the field of undergrad- 
uate statistical textbooks in several years. 

Classical large sample methods occupy the first seven chapters. Although 
the concept of population and sample are introduced on page 1, and prob- 
ability on page 2, the exposition is largely traditional to the extent that for 
each topic descriptive methods are presented first and the theoretical count- 
erpart is then considered. Statistical hypotheses and tests thereof are for- 
mally introduced about halfway through this group of chapters and are 
referred to frequently thereafter. Th eighth chapter deals with the familiar 
normal small-sample theory and introduces the joint distribution of s? and 
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z, confidence limits based on Student’s ¢ and x?, and tests of significance 
associated with the F distribution. A novel feature of the chapter is the 
presentation of the derivation and applications of the distribution of the 
range. 

The ninth chapter promises to branch out into less familiar material. Its 
title is “Non-Parametric Methods.” However, the teacher had better be 
prepared to answer some fairly sharp questions from the brighter members 
of his class as to exactly what is meant by the title. As far as can be gathered 
from the first page of the chapter, the author means “methods that do not 
require any knowledge concerning the form of the basic distribution func- 
tion.” After expounding on the usefulness of such an approach, he goes on 
to say, “Methods of the type just described are usually called non-para- 
metric methods because they do not involve the estimation of parametric 
values of a distribution function.” Then as the first example of such methods, 
he derives Tchebycheff’s inequality and shows how it can be used to test 
hypotheses about the population mean, given the population standard devia- 
tion! Wilks’ distribution-free tolerance limits are next discussed; but these 
too can be fitted into the parametric viewpoint. It is not until the theory 
of runs is taken up that the discussion becomes really “parametric-free.” 
Serial correlation as a measure of randomness is treated briefly at the end of 
the chapter. It would seem that the entire chapter might be better entitled 
“Distribution-free Methods,” although the serial correlation material seems 
to be no more distribution-free than the central limit theorem which ap- 
pears much earlier in the book. 

The tenth chapter contains standard material on the application of the x? 
approximation to the multinomial distribution. A commendable amount of 
attention is paid to the binomial and Poisson indices of dispersion. In the 
final thirty pages of the book, the author starts to skip very lightly and 
rapidly over some important modern material, apparently with the idea 
that a speaking acquaintance at this stage of the student’s development is 
better than no acquaintance at all. He introduces the reader to the Neyman- 
Pearson theory, to maximum likelihood estimation, likelihood-ratio tests 
(incidentally deriving Bartlett’s test on the way), the technique of setting 
sample sizes to insure a given power, Dodge-Romig principles of inspection 
sampling, stratified sampling, and sequential analysis. The reviewer sympa- 
thizes with the author’s point of view in presenting this survey. There is, 
however, a disadvantage of such superficiality that the author may not have 
considered. A student who has had a course using this textbook is able truth- 
fully to say on a job application that he has studied design of experiments, 
sequential analysis, etc.; and such a statement might be quite misleading to 
a prospective employer. 

In his use of the moment generating function as the principal tool for 
proving the standard classical asymptotic theorems and deriving the exact 
distribution of # and s?, the author follows in detail (perhaps unconsciously) 
the recommendations made by the reviewer some years ago in an article on 
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the subject in the American Mathematical Monthly (48: 374-86, 1941). It was 
gratifying to find a correct mathematical explanation of the effect of always 
putting the larger variance over the smaller one in F and to observe that the 
author avoided the usual incorrect statement about kurtosis. These are just 
examples of a much more general proposition: Here at last is a text at the 
undergraduate level which is almost always right and which is written by an 
author who clearly knows what he is talking about, and a good deal more, 
too. 

There is indeed an absolute minimum of incorrect or loose statements. 
On page 13, it is announced that “For a set of data that has been obtained 
by sampling a particular type of population called a normal population, it 
will be shown later that the interval (#—s, 2+8) will usually include about 
68 per cent of the observations,” which is hardly true for small sampling; 
but trivial deviations from rigor like this are almost never encountered. Any 
really serious adverse criticism which the reviewer might have would center 
about the treatment of regression, the omission of point estimation theory, 
and the handling of some of the proofs. 

Regression is approached via the idea of the association between two cor- 
related random variables. The difficulty is that the descriptive material does 
not seem to be tied in well enough with the succeeding theoretical material. 
It is not made really clear in the systematic formal treatment of regression 
and correlation that the “relationship” between the variables that is being 
estimated by the least square line is the locus of the means of the X-arrays 
of Y. Fifty pages later, when a small sample treatment of the standard error 
of the regression coefficient is given, it is brought home to the reader that 
this is the case and that moreover, the values of X are considered as “fixed” 
and X need not be a random variable to start with. But the vagueness of the 
initial presentation is not thereby dispelled. The reviewer was particularly 
sorry to see transformations, especially the logarithmic transformation, ap- 
plied on pages 90-91 to regression systems without any discussion at all of 
the question (all important because unweighted least-squares methods are 
apparently being recommended) of the effect of the transformation on the 
dispersion within arrays. It also seemed unfortunate that discriminant func- 
tions should be handled entirely descriptively, since the validity of the dis- 
crimination is of central interest. 

The other two points of criticism will be dealt with more briefly. The idea 
of unbiased estimate is touched on in one place, and the maximum likelihood 
method is presented with no discussion of the properties of the results ob- 
tained by using it. No general discussion of point estimates is ever given. 
Hoel’s complete preoccupation with testing hypotheses reminds the reviewer 
of the opposite example of Colonel Simon, who devotes his Engineers’ Man- 
ual of Statistical Methods almost entirely to point estimation techniques, 
paying little attention to the idea of testing hypotheses. It would seem that 
a book which sets out to be as forward-looking as this one can hardly afford 
to ignore Fisher’s rationalization of the most common way of stating an infer- 
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ence from data. As for the matter of the mathematical proofs, here only a 
tentative and very cautious statement should be made. It seemed to the re- 
viewer that some of the longer demonstrations were a little confusing and 
needed some polishing up, but probably the reviewer could do no better 
himself than the author has done. Reference is made particularly to the dis- 
cussion of change of variables on page 132 (which can perhaps be handled 
more neatly via the cumulative distribution function), and the derivation of 
the distribution of the range on pages 161-164. By the way, it was puzzling 
to encounter the following sentence while wading through that demonstra- 
tion: “Since u and v are arbitrary, they may be treated as statistical varia- 
bles.” 

In general, the text is written in an entirely competent mathematical style 
which will please members of mathematics departments who have obliga- 
tions in mathematical statistics. At the same time, the exposition is firmly 
based on the applications. The reader is made constantly aware by concise 
paragraphs of explanations and by examples and exercises simulating real 
life that the mathematical theory is being set up to provide a set of models 
for a particular class of physical situations and that the selection of the ap- 
propriate model is quite as important as being able to manipulate it once the 
choice has been made. 

Undoubtedly there will be further editions of the book, incorporating re- 
finements and meeting some of the criticisms stated above. Also, more highly 
polished competitors can be expected to appear in the future. The signifi- 
cance of the present edition is that it sets an entirely new standard of math- 
matical and statistical competence for textbooks on intermediate mathe- 
matical statistics. It is almost certain to have a far-reaching and elevating 
effect on expository writing in mathematical statistics for undergraduates. 
Other writers of textbooks in this class will have to try to do at least as well, 
and that will be quite an assignment. 


REVIEW BY WILFRID J. Dixon 
Associate Professor of Mathematics, University of Oregon 


HERE are several well-known books devoted mainly to descriptive statis- 
tics and including discussions on design of experiments and statistical 
inference based on distributions given without proof; there are also compre- 
hensive books by Wilks, Kendall, and Cramér, covering the mathematical 
theory of statistics from a mature mathematical viewpoint. There is a very 
great need for textbooks which present the fundamental statistical theory to 
the mathematically less mature. This text has made possible the presenta- 
tion to students with only a first calculus course of a great amount of theory 
by the use of moment generating functions. Distributions of various func- 
tions of the observations are obtained by noting the corresponding changes 
in the generating function and then recognizing the resulting distribution 
function from its moment generating function. 
This book is well titled. It contains only an introduction to the field of 
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mathematical statistics but does introduce the student to a wide range of the 
problems of mathematical statistics. The mathematical development makes 
extensive use of moment generating functions, which so abbreviates the 
development of the distribution theory that the student is able to see the 
results of the problem at hand before he has lost himself in the development. 
Further, the book is definitely a mathematical statistics book in that prob- 
lems of application of theory to practical situations are given secondary 
place. There is, however, a wealth of problems giving numerical practice 
with the theoretical results obtained. 

The author states that the book is intended for junior and senior science 
students. The reviewer believes that many such students will need an intro- 
ductory course in statistical method, since the introductory material on 
descriptive statistics is condensed to 20 pages (Chapters 1 and 2). Chapters 
3 to 7 (105 pages) are devoted to large sample theory, and the remainder is 
devoted to small sample theory, nonparametric methods, goodness of fit, 
testing of hypotheses, and statistical design of experiments. 

The introductory material covers classification of data and computation 
of moments. The large sample theory begins with the idea of general distribu- 
tion functions (both continuous and discrete) and moment generating func- 
tions with applications to the normal, binomial, Poisson, and multinomial 
distribution functions. The moments of each distribution are obtained from 
their generating functions. The normal and Poisson approximations to the 
binomial are also shown by the use of generating functions. The uniqueness 
and limit properties of generating functions are assumed. Distributions of 
the following statistics are obtained: the mean from normal and nonnormal 
populations, the difference of two means from normal populations, and the 
difference of two proportions. 

Chapter 5 presents the computation of the correlation coefficient and a 
discussion of regression in nonlinear situations. Chapter 6 presents the 
general distribution theory of two variables with a discussion of the normal 
surface. Chapter 7 presents the normal equations for multiple and partial 
correlation. Of particular interest is the development of linear discriminant 
functions. 

The small sample theory (Chapter 8) is introduced with the ideas of ex- 
pected values, unbiased estimates, and confidence limits. The generating 
function for x? is obtained, and its additive nature and the distribution of 
the sum of squares of normal variates are shown by comparison of generating 
functions. The distribution of s* is obtained by assuming the independence 
of X and s*. The distribution of ¢ is shown as a special case of x?. Applications 
are made to testing hypotheses concerning the mean, difference of means, 
and regression coefficients. The F distribution is developed with applications 
to testing hypotheses concerning two variances, homogeneity of means, and 
simple problems in analysis of variance. The distribution of the range in 
samples from a general population is developed as an introduction to the 
nonparametric methods of Chapter 9. 
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Chapter 9 presents Tchebycheff’s inequality with applications to the law 
of large numbers and a distribution-free development of tolerance limits, 
runs, and serial correlation. Repeated reference has been made in previous 
sections of the book to the importance of order in observations to lead up to 
this development. 

Chapter 10 presents the following tests of goodness of fit: x*, binomial 
and Poisson indices of dispersion. 

Although the student has been testing hypotheses throughout most of the 
previous work, a general discussion of testing hypotheses is reserved for 
Chapter 11. The topics covered are: two types of error, efficiency, and power 
functions. The method of maximum likelihood is illustrated by estimating 
min a Poisson distribution. The idea of the likelihood ratio test is shown for 
the mean of a normal distribution. The test for homogeneous variance illus- 
trates the construction of a test for a composite hypothesis. 

Chapter 12 is a short discussion on statistical design of experiments with 
the topics of validity, efficiency, and analysis of variance presented in the 
language of agriculture. A discussion on sampling covers single sampling, 
stratified sampling, and sequential analysis. Here, as throughout the book, 
only a bare introduction is given to these topics. A saving feature, however, 
is the many annotated references at the end of each chapter to which the 
interested student may refer. There are also many very good exercises, some 
numerical and others extending the theory already introduced. 

One wishes when he sees a book so ably done that the author had included 
much more material presented in the same excellent manner not only cover- 
ing more completely the topics presented here, but additional topics as well. 
This book, although intended only as a survey course, points the way to a 
more detailed study at this level of mathematical maturity, including exten- 
sions of nonparametric methods and analysis of variance beyond the sim- 
plest problems. 

Much stress is placed on efficiency and “best” procedures to use all the 
information available from a set of data. The reviewer believes that it is also 
important to know how less efficient procedures may be used for a quick ap- 
praisal of data. Also very useful “inefficient” statistics for the estimation of 
mean and variance are available. 

The standard notation is used throughout. Tables include: squares, square 
roots, ordinates and areas of the normal curve, and the standard tables of 
percentage points for x?, t, and F. 

An error was noted on page 166. The circular definition of the mean square 
successive differences is given in place of the noncircular form. 

There are many points of similarity between this book and A First Course 
in Mathematical Statistics by C. E. Weatherburn. These two books require 
approximately the same mathematical maturity. Both make use of generat- 
ing functions, Hoel to a greater extent that Weatherburn. The similarity 
in topics covered is best described bv stating the major differences. Weather- 
burn gives a more extensive treatment of analysis of variance including 
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analysis of covariance, but includes very little of the material in Chapters 
9, 11, and 12 of Hoel. 


Regression Analysis of Production Costs and Factory Operations, Second Edi- 
tion. Philip Lyle (Tate & Lyle, Ltd., Sugar Refiners, 40 Berkeley Square, London, 
W.1). Edinburgh 1, Scotland: Oliver & Boyd Ltd. (Tweeddale Court), 1946. Pp, 
xii, 204. 16s. 
Review sy H. A. FREEMAN 
Associate Professor of Statistics, Massachusetts Institute of Technology 


HIs is a useful and interesting book on the application of ordinary regres- 
Tan and correlation methods to the analysis of production costs and 
factory operations in the sugar refining industry. Such an application clearly 
involves both statistical and economic problems, and the author gives atten- 
tion to both. 

I take it that this book is intended for industrial personnel who appreciate 
a kindred approach to an unfamiliar subject. But it may be well to state that 
it is by no means a popular, nontechnical account of a statistical technique 
that has been found to be useful in industry, and those altogether unfamiliar 
with statistics may find it heavy going. The algebra of simple and multiple 
correlation and regression, which the author considers in some detail, is 
tedious and requires close attention; moreover, the exclusion of all mathe- 
matics other than elementary algebra requires that certain important propo- 
sitions be demonstrated by discussion, and some of these discussions are not 
easy to follow. 

Within the limitations of an algebraic treatment, the statistical quality 
of the book is adequate. It deals at length with the meaning and computation 
of correlation and regression coefficients for two and more variables. There 
are also sections on the sampling errors of these statistics and on tests of 
various null hypotheses in correlation and regression, and an extended sec- 
tion on fiducial limits of estimates made from regression equations. There are 
two appendixes in which topics treated briefly or not at all in the main text 
are taken up. In one of these, two interesting nomograms (which appear later) 
are described, one for ¢ and the other for use in direct testing of null hypothe- 
ses on simple and multiple correlation coefficients. In another, ways and 
means of determining functional relationships between two variables are 
discussed. 

Statistical analysis makes up the larger part of the book, but there are 
shorter sections on economic topics that are relevant to regression analysis 
of costs and output. Among these are one on unit costs, in which formulae for 
short and long term average, marginal, and arc costs are given, and another 
(plus related sections and isolated passages elsewhere) on the preparation of 
economic data for regression analysis. 

The book contains a brief bibliography, a summary of equations used in 
the text, a set of diagrams (inadequately labeled) illustrating correlation 
and regression, a glossary, and an index. 
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There are several matters on which the reviewer would have welcomed 
more information. Much attention is given to tests of hypotheses which seem 
to be of dubious economic significance; for example, testing the hypothesis 
of zero linear correlation with output-cost data. Such hypotheses seem 
merely to be straw men, and, whether the hypothesis falls or is left standing, 
little is learned about the economic nature of the situation. Another matter: 
the author’s “preparation” of economic data for regression analysis is, at 
least, risky. The sample size is selected so as to exclude improvements in 
manufacturing technique which would introduce “biassed errors”; adjust- 
ments are then made for variatioa in wage levels, price levels, renewal of 
equipment, and repairs and for the effect of work-in-progress on weekly out- 
put. These and other adjustments are discussed, but there is insufficient in- 
formation as to just how they are made. It is possible that these adjustments 
and allowances not only “cook” the data, as Lyle avers, but the analysis as 
well. 

I think that this is a useful book and that its principal value is as a good, 
detailed description of a successful application of regression and correlation 
technique to a difficult and important industrial problem. The reader who 
would like to learn the elements of correlation and regression would probably 
be better off to begin with a good general textbook, such as, say, that of Hoel 
or Yule and Kendall. 

The second and the first editions differ only slightly. The first edition was 
published in 1944 and was limited to 500 copies for sale in the world mar- 
ket; it was better printed on better paper. 


Sobre la Cuantificaci6n del Estilo Literario: Una Contribucién al Estudio de la 
Unidad de Autor en ‘‘La Celestina” de Fernando de Rojas. [On the Quantifica- 
tion of Literary Style: A Contribution to the Study of the Unity of Authorship 
in La Celestina by Fernando de Rojas.] José V. Montesino Samperio (Assistant 
Chief, Statistical Department, Banco Agrfcola y Pecuario, Caracas, Venezuela). 
Published in the Revista Nacional de Cultura, Nos. 55 and 56, 1946, Caracas, 
Venezuela. Pp. 63. Paper. 


Review By Paut R. Riper 
Professor of Mathematics, Washington University 


ie author of this brochure sets out to investigate by statistical methods 
whether the sixteenth century drama La Celestina is the work of one 
writer or of more than one. He prdves rather conclusively that two distinct 
literary styles are present and that it is highly probable that the first act was 
written by someone other than the author of the rest of the play. 

Some of the questions studied are number of words per paragraph, num- 
ber of sentences per paragraph, number of words per sentence, punctuation, 
figures of speech, rhythmic variation of sentence length. In reaching his con- 
clusion Montesino Samperio makes use of such techniques as testing the sig- 
nificance of the difference between the means or the standard deviations of 
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large samples. No new methods are developed and no new statistical prob- 
lems are posed, and the booklet is not to be compared with Yule’s The Sta. 
tistical Study of Literary Vocabulary. (See review in this JouRNAL, Vol. 39, 
1944, p. 527.) 


An Introduction to Industrial Statistics and Quality Control, Second Edition. 
Paul Peach (Associate Professor of Experimental Statistics, University of North 
Carolina, Raleigh). Raleigh, N. C.: Edwards Broughton Co., 1947. Pp. xv, 236. 
$5.00. Two reviews follow: 


REVIEW BY Paut C. CLIFFoRD 
Assistant Professor of Mathematics, Montclair State Teachers College 


HIs book is a revision of the original mimeographed edition which was 

favorably reviewed in this JournaL by J. H. Curtiss. While a few new 
sections have been added, the text in general is that of the preliminary edi- 
tion, but the material has been rearranged and reorganized into eleven chap- 
ters: Introduction, Acceptance Sampling, Control Charts, Dispersion, 
Control Charts for Variables, Special Methods in Acceptance Sampling, 
Tests of Significance, Special Quality Control Methods, Industrial and 
Scientific Measurements, Scientific Experimentation, Organization and Pro- 
cedures. The author has aimed at providing a reference manual for practical 
men and also a textbook for college students. The book is a well-written, 
generally accurate, and very readable introduction to the elements of statis- 
tical method as applied in industrial control and experimentation. The math- 
ematical theory is kept at a minimum, but where it is needed for precise 
statement the author has not hesitated to use it. The book should be very 
useful as a college text where omitted material can be filled in and the prob- 
lem material can be supplemented, but as a reference manual its usefulness 
suffers from the omission or brief treatment of certain topics and from the 
absence of an index. 

The material on acceptance sampling plans for attributes is based on an 
ingenious use of the ratio p2/p:, which is an original contribution. The tables 
needed for this method are comparatively simple, and by means of them one 
is able to establish quickly single, double, or sequential plans that will serve 
a wide variety of problems. The author, while stating the fundamental pur- 
poses, does not attempt to discuss the various published sampling plans in 
any detail. 

The theory of the control chart is well presented, and the general types of 
problems encountered on such charts are discussed. There is also a section 
on the use of charts with modified limits and the use of trend charts. The 
question of what constitutes a rational subgroup from the engineering view- 
point is not adequately discussed, and there are not enough actual case his- 
tories of the control chart to exhibit its full usefulness. Only two sets of con- 
trol chart data are given. The control chart for SX and the gridless control 
chart which the author favors appear to have some psychological advantages. 
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There is also a brief but succinct development of acceptance sampling by 
yariables, assuming a normal distribution with known standard deviation. 

In the remaining chapters there is an introduction to the theory of statis- 
tical estimation, tests of significance, analysis of variance, and the design of 
experiments. This part of the book still gives, as the previous reviewer re- 
ported, a rather bewildering effect. The last chapters on industrial manage- 
ment are clear and concise. A set of twenty problems in the appendix gives 
in each case a fully solved problem followed by another of the same type for 
the reader to solve. 

There are few misleading statements; these are in the main a matter of 
emphasis. The recommendation given on page 87 that representative sam- 
pling be corrected by randomization might cause an incautious operator to 
lose the order of observation and never detect differences in spindles. Since 
good lots are defined on page 31 as fraction defective p, or less, the careless 
student may, and in some cases does, assume that p; and p are the same, thus 
leading to a conclusion to reject average material 5 per cent of the time. 

Despite these minor criticisms I know of no single text more useful for 
a college class in engineering statistics and quality control. The main flaw 
with the book in this regard is the lack of sufficient problem material. As a 
reference manual for quality control, such topics as the Poisson distribution, 
the theory of runs, serial correlation, and the Dodge continuous sampling 
plan might well have been included. As a work for the practical man in indus- 
try, more emphasis might have been given the economic aspects, even at the 
expense of some of the statistical theory. 


REVIEW BY BerNarD P. DuppDING 
Research Physicist and Engineer, Research Laboratories 
General Electric Co. Ltd., Wembley, England 


ry author of this book comments on the volume of pamphlets and articles 
in the technical press which have appeared recently on the subject with 
which he deals. The increased interest in this subject has been created by the 
demand for war materials both in Great Britain and the United States 
of America. This outburst of enthusiasm on both sides of the Atlantic dur- 
ing the years 1942 to 1945 has aroused mixed thoughts in the minds of those 
who have persevered for some twenty years to interest those concerned with 
manufacture. On the one hand they rejoice to see a neglected subject taking 
a more correct place in the thoughts of the industrialist, but on the other 
hand they lament the years the locusts have eaten. This book will undoubt- 
edly do much to stimulate the training of students in applied statistics, and 
in so doing it will perform a useful service. The writer concurs with the au- 
thor’s opinion, indicated rather than expressed, that industry needs a greatly 
increased number of technicians having a working knowledge of statistics 
rather than a large number of trained statisticians. 

It is noted that the author calls attention in the preface to the use of the 
sums of results rather than averages as an original contribution to applied 
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statistics. In view of the evidence the book provides of the author’s familiar- 
ity with the subject, this claim is surprising because once statistical methods 
are regularly used, then many similar devices are introduced in the normal 
course of events. 

While the idea of introducing statistical principles to engineers by a game 
of chance has much to commend it, the choice of a peculiarly American game 
will, for British readers, greatly reduce the appeal of this teaching device. 
Although the rules are enunciated, this will not greatly assist the apprecia- 
tion of the statistical principles involved until the reader has practical ex- 
perience of the game. This comment is made because it emphasizes a basic 
principle in the application of statistics, namely, that until the industrialist 
has tried to play the game of “applied statistics” in the familiar environment 
of his daily work, he will not appreciate fully its appropriateness, its power, 
and the help it provides. Further, apart ‘rom a reference to Fisher’s book, 
there is little reference to the considerable contributions made in England 
both before and during the war. These two features detract from the value 
of a book which would otherwise have been widely acceptable. 

Acceptance sampling procedures are reviewed in considerable detail, and 
the more economically useful procedures based on sampling during produc- 
tion are developed therefrom. This order of presentation is probably inevit- 
able and arises from the history of manufacturing industries. The traditional 
practice of testing finished product for acceptance, adopted on a vast scale 
by Government departments concerned with the provision of materials 
and components for war purposes, tended to promote the application of 
statistical methods to these inspection procedures rather than to promote the 
application of the methods which aid the control of processes and thereby of 
products. In this environment the author is to be congratulated on the bal- 
anced review he has made of the various types of acceptance sampling. This 
review provides the right kind of assistance needed by engineers wishing to 
determine, in the light of the conditions and specifications which govern 
their activities, the procedure to follow when seeking initial experience. 
The fully worked out examples are important additional aids. 

The chapters referring to quality control and the discussion of the organi- 
zation of the wo.k will do something to redress any lack of emphasis the order 
of presentation places on this more economically useful aspect of applied 
statistics by which the production of rejectable product can be minimized. 
The writer would have liked to have seen some reference in the discussion 
on short runs to the utility of cataloguing average ranges for small samples 
as measures of the performance of a machine, *. process, or an operator. When 
control chart methods are well established, it is these data which provide 
“information from earlier job studies” so useful to the designer, the produc- 
tion planner, and the production supervisor. Reference is made to control 
charts involving trends. This type of problem has received theoretical treat- 
ment in Britain, but the statistical solutions proved to be of academic rather 
than practical importance. The introduction of modified limits, which is 
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illustrated in the book, has usually met practical requirements. 

The brief chapters on specifications and measurement can only serve to 
whet the statistical appetite of the engineer who has gained experience with 
the elementary statistical methods described earlier. Additional guidance 
would have to be sought elsewhere. 

The desirable procedure of tabulating the symbols used has been followed, 
and the author’s suggestion that the symbols and notation of quality control 
should be open to revision is specially commended. This subject demands the 
early attention of English-speaking technicians if undesirable precedents 
are not to prejudice the practical development of the subject. 

In a book containing so much evidence of balanced judgment, it is a shock 
to note a table giving factors to five significant figures for use in estimating a 
population standard deviation from the range in one small sample. The 
more usually accepted procedure is to utilize the average range derived from 
a number of small samples, and even then the factors need only to be known 
to two or three significant figures. 

It is always difficult to decide the order of presentation of statistical 
methods but the writer feels that Chapter 7, “Tests of Significance,” tends to 
destroy the link between acceptance sampling and quality control methods. 
Would it not be more appropriately placed next to the chapter “Scientific 
Experimentation”? 

The value of the book would be enhanced if it were provided with an index 
as well as the detailed table of contents. 

Paul Peach has succeeded in writing a readable and instructive manual on 
the application of simple statistical principles to industrial problems. The 
book fulfils its purpose and will serve as a valuable introduction to industrial 
statistics both for the engineer and the college student. Elementary text- 
books yet to be written will place emphasis on different aspects of industrial 
applications because of the experience of their respective authors, but they 
will certainly contain much that is well stated and illustrated in the book re- 


viewed. 


Actuarial Statistics: Vol. 1, Statistics & Graduation. H. Tetley (National Provi- 
dent Institution, 48 Gracechurch St., London E.C. 3). Coordinating editor, 
Harry Freeman (Government Actuary’s Department, Caxton House East, 
Tothill St., London S.W. 1). Published for the Institute of Actuaries and the 
Faculty of Actuaries. London N.W. 1: Cambridge University Press (Bentley 
House, 200 Euston Road), 1946. Pp. xi, 285. 21s. T’'wo reviews follow: 


REVIEW BY EvGENE LuKacs 
Professor of Mathematics 
Our Lady of Cincinnati College 


7 Institute and the Faculty of Actuaries have a long and outstanding 
record of promoting the education of actuaries. They have organized 
courses for the prospective members of their profession and have published 
textbooks for actuarial students. It is interesting to note the ever-expanding 
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scope of these courses which is reflected in the publications. The present 
volume is not an independent text but an integral part of this series of text- 
books. It assumes that the reader is familiar with the subject matter of Free- 
man’s book (Mathematics for Actuarial Students, 2 Vols., Cambridge Uni- 
versity Press, 1939). Comparing this text with previous publications one 
notes an increased emphasis upon statistical studies. The book is not intend- 
ed to be a self-contained text on statistics but is written for a group of 
readers interested in certain highly specialized applications. This considera- 
tion inevitably influenced the choice of the subject matter. Consequently a 
statistician may find that a great many topics which he considers to be of 
paramount importance are excluded. This is true for most of the methods 
of mathematical statistics which have been developed during the last forty 
years. The most important exception is the chi-square test of goodness of 
fit which is discussed to some extent. These omissions are justified by the 
purpose of the book and the fact that the actuary has—at least at present— 
little opportunity to apply modern statistical methods. The aim of the author 
is to give the student an introduction into the elementary parts of statis- 
tics and to enable him later to study more advanced topics. It can be ex- 
pected that the book will stimulate the reader and will arouse his interest 
in the mathematical methods of statistics. 

The book is divided into two parts. The first four chapters deal with 
statistics, the last six with graduation. Chapter 1 is essentially a review of 
the fundamentals; wherever necessary, references to Freeman’s text are 
given. Chapter 2 deals with the binomial and the normal! distributions. Two 
proofs are given for the convergence of the binomial to the normal distribu- 
tion (De Moivre’s theorem). The first proof is made under the assumption 
that p =q =43; the second proof supposes (g¢ —p)/npgq to be small. Instead of 
standardizing the variable of the binomial distribution, the variance npg 
is fixed during the passage to the limit. This conflicts with the assumption 
underlying De Moivre’s theorem that the probabilities are constant. The 
proof itself is open to serious objections. Besides, it is not even mentioned 
that the theorem is of general validity. In this reviewer’s opinion it is not 
quite satisfactory to prove the theorem only under restrictive conditions. It 
might have been preferable merely to state the theorem without a proof. 
Another possibility would have been to mention the centrai limit theorem 
and to present the convergence of the binomial to the normal distribution as 
a special case. Chapter 3 discusses correlation and regression. Throughout the 
greater part of this chapter it is assumed that the regression is linear. At the 
end some remarks are made on curvilinear regression which are utilized later. 
Chapter 4 deals with sampling and introduces the reader to some of the more 
modern ideas. This chapter should provide an incentive to the thoughtful 
reader to continue his statistical studies. A clear distinction is made between 
a population parameter and the corresponding statistic derived from a sam- 
ple. Nevertheless, the notation used does not avoid ambiguity as occasion- 
ally the same letter is used for both. Questions of statistical inference are 
discussed; the idea of the sampling distribution is introduced. In this con- 
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nection the author remarks that comparatively little is known about sam- 
pling distributions and that it is assumed in most instances that sampling dis- 
tributions follow the normal curve. It is regrettable that the existence of the 
extensive theory of sampling from a normal population is not mentioned. 
Moreover it is certainly not generally true that the normal law holds for any 
statistic calculated from the sample. Instead of this sweeping statement, a 
reference to the asymptotic properties of sampling distributions would have 
been helpful. It is known that the distribution of any sample characteristic 
based on moments is asymptotically normal, provided certain conditions are 
satisfied. This theorem would have justified the author’s assumption that 
the sampling distributions discussed in the text are normal. 

Chapter 5 approaches graduation as the problem of fitting a curvilinear 
regression line to the observations. The observations are considered as a 
sample, and tests for smoothness as well as tests of fidelity to data are dis- 
cussed. In this connection the chi-square test of goodness of fit is introduced. 
Chapter 6 deals with the graphic method of graduation; Chapter 7, with 
graduation by reference to a standard table. Chapter 8 discusses graduation 
by summation formulae. As an introduction, the operator [n] (“summation 
n”) is reviewed briefly. Operands are chosen so as to avoid second difference 
errors, and various well-known formulae of graduation are discussed. It is 
mentioned that not every graduation formula based on linear compounding 
of ungraduated values is a summation formula, although every summation 
formula can be so expressed. The expanded formula is considered; the 
coefficient curve and the wave cutting index are discussed. The error reduc- 
ing index and the smoothing index are defined, and detailed examples for 
the analysis of summation formulae along these lines are given. It is regret- 
table that this analysis is not extended to the study of the curves reproduced 
by a summation formula. This more general approach would include gradua- 
tion based on linear compounding. At the same time a deeper understanding 
of the nature of mechanical graduation could be gained. Chapter 9 discusses 
graduation by mathematical formulae such as Makeham’s law. A concise and 
clear description of the usual methods is given. Again it would have been 
valuable to show that Makeham’s law is preserved by certain graduation 
formulae. A few remarks on curve fitting conclude this chapter. Chapter 10 
gives a brief and lucid discussion of osculatory interpolation. 

Throughout the text the reader will find many interesting examples. 
Enough problems are worked out in detail to give the student a better appre- 
ciation of the methods discussed. Each chapter is followed by a brief bibliog- 
raphy which should be helpful to the reader desiring to penetrate deeper 
into the subject. 


REVIEW BY MortTIMER SPIEGELMAN 
Metropolitan Life Insurance Company, New York 10 


= volume is designed as a textbook for students preparing for the exam- 
inations of the Institute of Actuaries (England). It assumes a knowledge 
of the measures of location and dispersion as described in an earlier textbook 
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published for the Institute of Actuaries, Mathematics for Actuarial Students 
by H. Freeman. 

The development of the section on statistics is along traditional lines. The 
first chapter defines the moments of a frequency distribution, the mode, 
mean deviation, skewness, and index numbers and develops the formulae 
of King and Hardy commonly used to estimate unit intervals irom a series 
of group totals of equal length. The second chapter deals with the binomial 
and norma! distributions; the third, with correlation; and the fourth (the last 
chapter on statistics), with some elementary notions of sampling. This con- 
tent introduces the actuary to the elements of statistical method. It can be 
supplemented profitably by further reading, since statistical sampling pro- 
cedures are becoming of increasing importance in actuarial work. They have 
already been used in some reserve valuations and also in connection with 
some actuarial mortality investigations; papers in the latter field of applica- 
tion have recently appeared in the Transactions of the Actuarial Society of 
America, Vol. 46, 1945 and Vol. 47, 1946. The use of modern statistical nota- 
tion to distinguish between universe and sample would have made it easier 
for those actuaries looking for new statistical tools to read more advanced 
texts. 

There are several instances where the discussion needs clarification. Thus, 
on page 34, the author offers the remark that ‘‘There is in fact only one nor- 
mal curve, a fact which makes it of the first importance in statistical theory.” 
Perhaps it was intended to say that there is only one mathematical form by 
which the normal curve is recognized. But why this unique form should give 
the normal curve alone great importance in statistical theory is not clear; 
surely, the fact that there is one easily recognized expression for the Cauchy 
distribution does not give it any importance. 

Then, from page 89, “Comparatively little is known about sampling dis- 
tributions. In most instances, however, it is assumed that these distribu- 
tions follow the normal curve of error. Actual experiments show that this 
assumption is not unreasonable for most functions, but it is felt nowadays 
that very few sampling distributions are really accurately represented by 
any such simple law.” This statement overlooks completely the fact that our 
knowledge of sampling distributions is by no means negligible and is still 
advancing at a good rate. Later on page 89, we find that, except for the x? 
test, ‘for practical purposes we almost invariably assume that the normal 
law holds for any statistic calculated from a sample: i.e. that if we took 
thousands of such samples, each consisting of the same number of observa- 
tions and calculated the statistic for each, they would be found to be dis- 
tributed according to the [normal] law.” Apparently, the author intends this 
to cover the case of samples with few observations, as well as those with 
many. The student is not given any indication that the distribution of the 
statistic may vary with the size of the sample or that there are distributions 
other than the normal and x? that are of practical importance. 
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The method of least squares, which is treated briefly and not too ade- 
quately in the section on graduation, might have been presented earlier on 
page 56, where the author uses the method to develop the expression for 
estimating the slope of a straight line fitted to observations. The statistical 
chapters also contain a few other instances, of lesser importance than those 
mentioned, where the exposition might have been made more obvious. Thus, 
on page 24, it is said that “The phrase weighted mean is usually used in statis- 
tical books when the actual frequencies are not available and have to be 
estimated.” This confusing statement follows a clear example explaining 
the weighted mean. 

The section of the book devoted to graduation has an opening chapter on 
general considerations and tests, followed by chapters on the graphic method, 
on graduation by reference to a standard table, on graduation by summation 
formulae and by mathematical formulae, and on osculatory interpolation. 
These chapters are, essentially, summaries of the graduation methods devel- 
oped by English actuaries over more than a century. 

The description of tests for smoothness of a graduation on page 117 has 
the suggestion “that, in comparing two different graduations for smooth- 
ness, the sum of the third differences should be found for each table.” The 
author neglects to add here that this sum should be taken without regard to 
sign, although his subsequent treatment shows that this is the proper pro- 
cedure. It might have been pointed out that tests for smoothness are not 
needed where the graduation has been by mathematical formula. 

In graduation of mortality rates, there need be no single road to the end 
product, and the methods used will be influenced by the object in view. Some- 
times the results of a graphic graduation, which is described here in some 
detail, will be graduated further by another method, and, in other instances, 
the results of a graduation by some computing process may be treated fur- 
ther graphically. The student, often inclined to follow one method minutely, 
should be made aware of the wide scope he has in using his talents. 

The chapter on graduation by summation formulae is lengthy and con- 
tains considerable detail regarding the method. However, one wonders 
why all this attention is given since (p. 187) “Summation formulae owe their 
importance to the ease with which they can be applied, but with modern 
mechanical aids this is not now as important as formerly.”’ Again, on pages 
190, 195, and 213, brief reference is made to more general formulae of the 
linear compound type (of which summation formulae are a special class), 
but the matter is not discussed further in the book, nor is the student given 
any references. The text of this chapter needs correction at three points. On 
page 197, the second paragraph from the bottom should read “Hence, . . 
the standard deviation of the third difference of the ungraduated error... . 
In the left hand side of the formula at the top of page 201, the A is omitted. 
Finally, at the top of page 204, the smoothing index should be the square 
root of the sum of squares of the coefficients of the third differences divided 
by the square root of 20. 
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The chapter on graduation by mathematical formulae deals largely with 
the Makeham and Gompertz curves and with blending functions. We also 
find here that “The most important curves for fitting to statistical data were 
developed by Karl Pearson and bear his name.” At most, this statement 
could apply to frequency distributions. 

The author seems to doubt the general applicability of osculatory inter- 
polation formulae, but their use in such important public documents as the 
decennial English life tables since 1901 and those for the United States mer- 
its special attention to them. To some extent, the “weak graduating power” 
of osculatory interpolation formulae has been overcome by the modification 
introduced by Jenkins; this is not described in the present text and is only 
referred to briefly in the pages of Freeman’s book cited in the bibliography. 
Another rather important omission from the book is the difference-equation 
method of graduation; this method has a certain popularity with American 
actuaries. 

The section on graduation in the book under review bears little resem- 
blance to the monograph Elements of Graduation prepared by Morton Miller 
and published jointly by the Actuarial Society of America and the American 
Institute of Actuaries in 1946. The latter deals only with fundamentals and 
does not go into detail on any one topic; it does, however, cover briefly the 
topics omitted by Tetley, namely, linear compounding, modified osculatory 
interpolation, and the difference-equation method. The monograph by Miller 
eases the student into the more comprehensive Mathematical Theory of 
Graduation by Robert Henderson, published by the Actuarial Society of 
America in 1938. For a detailed study of graduation, reference should also 
be made to Trend Analysis of Statistics by Max Sasuly (1934). 

Among the first books to which the actuarial student may turn to supple- 
ment his statistical knowledge gained from the volume under review is 
The Fundamental Principles of Mathematical Statistics, by Hugh H. Wolfen- 
den (1942), which was prepared “with special reference to the requirements 
of actuaries and vital statisticians.” Wolfenden’s book not only clarifies some 
points left obscure by Tetley but also includes many additional topics of 
importance. Also, Wolfenden has an excellent “outline of a course in gradua- 
tion” describing the principles involved, including those of methods not dis- 
cussed by Tetley. 

Since this book was designed specifically as a textbook for British actuarial 
students, it is difficult to appraise it in the light of more general purposes. 
The examining bodies of the Institute of Actuaries in England and of the 
Faculty of Actuaries in Scotland are, of course, the judges of what is to be 
expected of their students. Evidently, their statistical requirements are 
limited to the topics covered by Tetley. However, in the section on gradua- 
tion, the omission of the methods mentioned above does leave a gap to be 
filled. The book contains many excellent illustrative examples and problems 
using insurance data. 
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Tables for Testing the Homogeneity of a Set of Estimated Variances. Computed 
by Catherine M. Thompson and Mazine Merrington. Prefatory note by H. O. 
Hartley and E. S. Pearson. From Biometrika, Vol. 33, Part 4, June 1946. London 
W.C. 1: Biometrika Office, University College, 1946. Pp. 295-304. 2s. Paper. 


Review By G. A. BAKER 
Assistant Professor of Mathematics and Assistant Statistician 
in the Experiment Station, College of Agriculture 
University of California, Davis 


ODERN Statistical analyses of data often lead to the calculation of a 
M number of independent estimates of variances. If these estimates can 
be assumed to apply to the equal variances of normal parents, the subse- 
quent analyses are often much simplified. Many investigators in the past 
have used the simplified analyses without any critical consideration of the 
validity of the assumptions of homogeneity and normality. The first test of 
the homogeneity of population variances was the L,-test given by J. Ney- 
man and E. 8. Pearson (1931). The Z,-test is based on the random sampling 
distribution of a ratio of a weighted geometric to a weighted arithmetic 
mean of mean squares from which the variances are estimated. This sampling 
distribution has not as yet been found exactly except in special cases. 
M. S. Bartlett (1937) modified the weights used in the L,-test and showed 
that the chi-square distribution, under certain conditions, can be used as an 
approximation to the sampling distribution. H. O. Hartley (1940) found 
that the weighted means of chi-square distributions gave a better approxi- 
mation. The present tables are based on Hartley’s results. 

The criterion for testing homogeneity of variances is denoted by M in- 
stead of the earlier L;. If there is a common variance, then the probability 
distribution of M is approximately described by k (the number of inde- 
pendent estimates of variances), c; (the sum of the reciprocals of the num- 
bers of degrees of freedom of the separate estimates minus the reciprocal 
of the sum of individual degrees of freedom), and cs (an expression similar to 
¢ but involving third powers of the numbers of degrees of freedom instead 
of first powers). Tables 1 and 2 permit the 5 and 1 per cent significance levels 
of M to be obtained. They are tables of double entry for k and c¢; with two 
extreme values for each combination of these two quantities. If necessary, 
c; may be calculated to select an intermediate value of M. An example and 
detailed explanations are given. When the numbers of degrees of freedom 
on which the estimates of the variances are based are all greater than or equal 
to 3, the present approximation is very good; and where some of the degrees 
of freedom are as small as 2, the approximation is still adequate. However, 
a systematic bias is present which is small if all of the numbers of degrees of 
freedom are equal to or greater than 4. 

The general idea of the use of the tables is to apply the test of homo- 
geneity; and if the probability of nonhomogeneity is less than 5 or 1 per cent, 
then the sample variances are treated as though they were estimates of a 
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common variance. This procedure leads to a bias in the results of the subse- 
quent statistical analysis, as has been pointed out by T. A. Bancroft (1944), 
since errors of the second kind in the sense of Neyman and Pearson may be 
often made; that is, the variances which are being estimated may differ 
somewhat widely and still the test indicate homogeneity at the 5 or 1 per 
cent levei. The effect and magnitude of this bias has not been fully investi- 
gated. Also, the effects of possible non-normality of underlying frequency 
distributions have not been discussed. 

The application of these tables will be much better than # uniform un- 
critical assumption of homogeneity of variances and should greatly improve 
current statistical practice. 


The Factorial Analysis of Human Ability, Second Edition. Godfrey H. Thomson 
(Professor of Education, University of Edinburgh). London E.C.4: University 
of London Press, Ltd., June 1946. Pp. xvi, 386. 20s. (Boston 7: Houghton Mif- 


flin Co. [2 Park St.], 1947.) 


REVIEW BY HELEN M, WALKER 
Professor of Education, Teachers College, Columbia University 


HEN a new edition of a well-known work appears, readers of a review 
W iwi wish to know what relation the revision bears to the earlier edition. 
Almost all of the first edition of this book has been retained. The first fifteen 
chapters (240 pp.) have received such revision as could be accomplished 
without affecting pagination, and two subsequent chapters “Limits to the 
Extent of Factors” and “The Sampling of Bonds” and the mathematical 
appendix are practically unchanged. Some new material has been inserted 
at the ends of some of these chapters, minor numerical corrections have 
been made in certain examples, N has been replaced by N —1 in the denomi- 
nator of several formulas, and references to Addenda at the end of the book 
have been inserted. The section on Simple Structure has been completely 
rewritten and greatly amplified. There are new chapters on Oblique Factors, 
Second-Order Factors, the Maximum Likelihood Method of Estimating Fac- 
tor Loadings (written by D. N. Lawley), and a searching final chapter en- 
titled “Some Fundamental Questions.” 

Presumably the economies of postwar pubiishing dictated this manner of 
revision which at times produces a rather disjointed effect. For example, a 
four-line paragraph was inserted stating that the use of the standard error 
of r should be discontinued unless the sample is large and r small and refer- 
ring to an Addendum where the use of Fisher’s z-transformation is explained. 
Yet throughout the book the earlier treatment making use of the formula 
(1 —r*) /./N—1 for the standard error of r has been retained, without even the 
suggestion that this r is the population and not the sample value. Other il- 
lustrations could be adduced of an improved technique which is carefully 
presented in one of the new sections of the book but which has not replaced 



















































‘TION 


ubse- 
944), 
Ly be 
differ 
l per 
resti- 


ency 


1 un- 
rove 


mson 
rsity 
Mif- 


view 
‘ion. 
teen 
hed 
the 
ical 
‘ted 
ave 
mi- 
ook 
ely 


ror 
er- 
ed. 
ula 
the 

il- 
lly 
ed 











BOOK REVIEWS 343 


a less satisfactory technique in the sections carried over from the first edition. 
Therefore the reader should be warned that the book must be read in its en- 
tirety and in particular the Addenda must not be overlooked. 

The author assumes almost no technical training on the part of his readers; 
he defines the coefficient of correlation and the standard deviation and helps 
his readers visualize geometric models by speaking of walls instead of planes. 
Yet he attacks quite difficult ideas, developing them in non-technical lan- 
guage and employing vivid phrases to stimulate the imagination of the 
reader (“the crowd of persons is in the ellipsoidal condition,” “Such factors 
are like trees and houses to the man who is doomed only to see the ground. 
He can measure the length of the tree’s shadow but not its height. He must 
remain content with shadows. And so with factors.” “... the mother- 
tongue being, as it were, the physical body of the mind, its acquired struc- 
ture.” [Simple structure] “is rather like wanting to run a school with as few 
teachers as possible, but each teacher to have a large number of free peri- 
ods.”) The book is more a treatise than a text. It contains no practice mate- 
rial, no exercises, no lists of questions or other stigmata of the textbook. There 
is far less emphasis on matrix algebra and economical routines for computa- 
tion than is found in most of the books and articles about factor analysis 
published in the United States and far more emphasis on underlying philoso- 
phy, meanings, and interpretation. If any one imagines that mastery of ma- 
trix algebra and n-dimensional geometry will be sufficient equipment for 
making basic contributions to factor analysis, he should with a humble mind 
read and ponder Section V, “The Interpretation of Factors,” in particular 
the chapters on “Definition of g,” “The Sampling of Bonds,” and “Some 
Fundamental Questions.” 

American readers may be surprised that the strong emphasis on Spear- 
man’s Two-Factor theory, present in the first edition, has been retained. 
American writers tend to treat the Two-Factor theory as an interesting and 
important stage in the historical development of Factor Analysis about 
which a well-educated factorist cannot afford to be completely uninformed. 
Thomson like most English writers accords it a more prominent role and dis- 
cusses at length hierarchical order, the meaning of g, the sampling reliability 
of tetrad-differences, and other Spearman concepts, commenting, “The main 
idea which still, rightly or wrongly, dominates factorial analysis was enun- 
ciated then by him, and practically all that has been done since has been 
either inspired or provoked by his writings.” 

Several topics are treated here which have not yet found their way into 
textbooks printed in America. Part IV presents an admirable brief exposi- 
tion of the somewhat controversial idea of correlations between persons, a 
topic upon which much has been printed in England but almost nothing in 
America. The material presented in Lawley’s chapter on “The Maximum 
Likelihood Method of Estimating Factor Loadings,” otherwise available only 
in English periodicals, is of great importance particularly because it offers 
the most satisfactory solution thus far furnished to the vexing problem of 
significance tests as to “when to stop factoring.” 
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Mention of certain points which appear to the reviewer as fiaws must be 
made in passing. On page 9 a possible solution of a set of five linear equations 
with six unknowns is presented as though it were a necessary solution. The 
concept of mathematical expectation is no more difficult than many other 
concepts presented here, and its use in such discussions as that of the sam- 
pling distribution of tetrad-differences on page 47, together with a statement 
of the assumptions as to independence of samples, would make the treatment 
theoretically better. Variates with zero mean and unit variance are called 
“standardized” and the comment is made that about two-thirds of them lie 
between plus and minus one. Variates with zero mean and variance equal! to 
the number of cases are called “normalized.” This is likely to make the un- 
tutored reader assume normality of distribution where such is not necessary. 
The geometric picture presented in Chapter IV is highly confusing and it is 
regrettable that it was not completely rewritten in line with the suggestion 
acknowledged in the Addenda. If a reader is not already familiar with the 
space in which orthogonal axes represent individuals and a single point rep- 
resents the scores of all individuals on a single test, so that the cosine of the 
angle between two test vectors represents the coefficient of correlation be- 
tween those tests, the reviewer does not see how he could gain clear under- 
standing from this chapter. If he is already familiar with this idea, he will be 
puzzled to be told that individuals are represented by points along the test 
vectors until he learns from the Addendum that individual axes have been 
projected on the test vectors. 

No serious student can afford to remain unfamiliar with this book. If he 
is exclusively interested in algebraic manipulation or even in mathematical 
theory, it will irritate him in a highly therapeutic manner and give him new 
points of view. If he is primarily interested in the psychological interpreta- 
tions to be made from factor theory and puzzled by the controversies in this 
area, he will be grateful for the penetrating analyses found here. 

Thomson’s own comment on the importance of factor analysis should be 
quoted in closing: “Yet with all the dangers and imperfections which attend 
it, it is probable that the factor theory will go on, and will serve to advance 
the science of psychology. For one thing, it is far too interesting to cease to 
have students and adherents. There is a strong natural desire in mankind 
to imagine or create, and to name, forces and powers behind the facade of 
what is observed, nor can any exception be taken to this if the hypotheses 
which emerge explain the phenomena as far as they go, and are a guide to 
further inquiry. That the factor theory has been a guide and a spur to many 
investigations cannot be denied, and it is probably here that it finds its chief 
justification.” 


A First Course in Mathematical Statistics. C. FE. Weatherburn (Professor of 
Mathematics, University of Western Australia, Perth, W.A.). London N.W. 1: 
Cambridge University Press (Bentley House, 200 Euston Road), 1946. Pp. xv, 
271. 15s. (New York 11: Macmillan Co. [60 Fifth Ave.], 1947. $3.50.) Two re- 
views follow: 
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REVIEW BY WALTER LEIGHTON 
Professor of Mathematics, Washington University 


ROFESSOR Weatherburn dedicates his book to Professor R. A. Fisher 
gee to the memory of Professor Karl Pearson. The reader, it is believed, 
will agree that it is a worthy contribution in a distinguished tradition. The 
book is carefully written, and the result is a high order of clarity through- 
out. It is not a little astonishing to realize after reading the book how much 
both of theory and of example has been set comfortably in 271 pages. 

The author presupposes an elementary knowledge of the calculus and 
addresses his book to specialists in the applied sciences, not to mathemati- 
cians. It would appear that he has abided by his agreement with his readers 
without offending the sensitivity of the mathematician in the matter of 
attention to rigor of demonstration. His treatment of probability is quite 
adequate for the purpose of his book. He does not cast his lot with any of the 
more vociferous schools of thought on this subject, and in the opinion of this 
reviewer, he quite properly avoids a critical and comparative analysis of 
the more generally accepted approaches to the subject. 

An innovation is the author’s use of the properties of Beta and Gamma 
variates to derive the sampling distributions of those statistics which are 
fundamental in the more commonly used tests of significance. His exposi- 
tion in this connection has been influenced greatly by papers of D. T. Saw- 
kins, which influence he graciously acknowledges. This reviewer was very 
favorably impressed by the unity and coherence which the use of Beta and 
Gamma variates brings to this type of exposition at a relatively elementary 
level. 

Some readers will regret that the treatment of small samples is deferred 
until the last third of the book (pp. 185 ff.). Its central importance nowadays 
in applications would seem to suggest very early attention. 

The typography of the book is excellent, and the rare misprints which 
were observed were not misleading. This is a book every statistician will 
want in his library. 

REVIEW BY CHARLES P. Winsor 
Assistant Professor of Biostatistics, School of Hygiene 
and Public Health, The Johns Hopkins University 


HIs book is intended to cover a year’s course on the mathematics of sta- 

tistics, for students with “an ordinary knowledge of the Integral Calcu- 
lus.” It covers much of the same ground as Hoel’s Introduction to Mathe- 
matical Statistics, reviewed elsewhere in this issue. The first seven chapters 
cover classical ground, including the binomial, Poisson, and norma! distribu- 
tions, the bivariate normal, the theory of simple sampling of attributes and 
variables, and the standard errors of statistics. There follows a chapter on 
Beta and Gamma distributions, with useful theorems on the distribution of 
sums, products, and quotients of such variates. Then follow chapters dealing 
with chi-square, Student’s ¢, the variance ratio, the analysis of variance and 
covariance, and multivariate distributions. 
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The author makes little attempt at mathematically rigorous or precise 
definitions of his fundamental concepts. In fact, after several readings, the 
reviewer is not clear what definition of a frequency distribution would be con- 
sistent with the statements in the text. We are first introduced to a group 
of N objects, giving rise to a distribution of values of somé variable. We then 
meet the statement, “We shall presently consider continuous frequency dis- 
tributions. But it should be pointed out at once that the variable may be 
either continuous or discrete.” We then find: “A continuous distribution is 
one in which the variable takes every value between certain limits, a and b. 
The total frequency is therefore infinite.” There follows, in Chapter 2, a dis- 
cussion of probability distributions, and in Chapter 3 we find a discussion of 
the binomial distribution, with the statements: “A binomial frequency dis- 
tribution is one in which the relative frequencies ... are equal to their 
probabilities in the above distribution,” and “in a theoretical frequency 
distribution like the above, the individual frequencies are not necessarily 
integral.” 

On this point Hoel seems definitely clearer. He distinguishes clearly be- 
tween an observed frequency distribution, which is necessarily finite in size, 
and a theoretical frequency distribution, thought of as the parent population 
from which a sample has been drawn. 

Neither Weatherburn nor Hoel seems entirely satisfactory in the treat- 
ment of correlation and regression. Neither distinguishes clearly between 
the case of a random sample drawn from a bivariate population and the case 
of a selection of values of the independent variable. Both give as the stand- 
ard error of estimate from a regression line, ¢\/1— r?, without consideration 
of the uncertainty of the estimate of the regression equation. 

Weatherburn’s treatment of nonlinear regression is brief. He gives the 
least squares solution for a parabolic regression and indicates that some non- 
linear expressions may be linearized by transformation. He does not indicate 
that such linearization will not give the least squares fit to the original equa- 
tion unless suitable weighting is used. Hoel seems sounder on this point. 

Weatherburn’s chapter on analysis of variance and covariance is brief 
and not too helpful to the applied statistician. Hoel’s discussion (which omits 
covariance entirely) is even briefer but does bring out clearly one of the main 
uses of analysis of variance, the elimination of irrelevant variation both 
from the estimates of effects and from the estimates of error. 

There is no adequate discussion in Weatherburn of the principles on 
which methods of estimation or tests of significance should be chosen. The 
first topic is untouched; the only discussion of the second (in §91) seems likely 
to mislead a careless reader. In this section we find the following. First, 
“By means of it [the x? test], however, we merely test the rareness of the sum 
of squares 22,*. Clearly, before accepting the sample as not a rare one, we 
must be satisfied about other points also.” And he goes on to say that “a 
set of n statistically independent variates may be formed in many ways from 
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the given variates, each of which provides a further test of the rareness of 
the sample.” While the antecedent of “which” in this sentence is perhaps not 
clear, it does seem clear that the industrious statistician ought always to be 
able to find some test which would show that his sample is a rare one; and 
apparently he has Weatherburn’s authority for such a procedure. 

Hoel recognizes that problems do exist in connection with methods of 
estimation and choice of significance tests, though one may wonder whether 
his treatment is adequate to clarify the mind of the student. (Incidentally, 
it is a little surprising to find no reference in Hoel to Neyman and E. §. 
Pearson; their names are not even mentioned in the discussion of Type I 
and Type II errors.) 

Rather surprisingly, Weatherburn gives no discussion of contingency 
tables. His discussion of significance tests for the difference between ob- 
served proportions in two samples deals only with the large sample case. 
Hoel discusses contingency tables in general but gives no special treatment 
for the 2 X2 case. 

Weatherburn’s policy in regard to references seems, to the reviewer, 
preferable to that of Hoel. Weatherburn lists about a hundred titles starting 
with Helmert’s 1875 and 1876 papers and coming down through Wilks’ and 
Kendall’s books. Hoel gives references to the original literature only (appar- 
ently) where no textbook is available. There are a few rather surprising 
omissions from Weatherburn’s references. Thus, Fisher’s classical paper of 
1922 is not cited, nor is Pearson’s 1895 paper on skew variation. The latter 
omission is the more surprising in that the 1894 paper on the decomposition 
of a distribution into two normal curves is listed. The only reference to Ney- 
man and E. S. Pearson is to their 1931 paper on the problem of & samples. 

The reviewer’s general conclusion is that both Weatherburn’s and Hoel’s 
books should be on the shelves available to students but that most teachers 
will probably feel that neither is completely satisfactory. This is perhaps a 
good thing; it will force the teacher to decide for himself what his students 
should be taught. 


LETTERS ABOUT BOOKS 








Readers are invited to submit letters about statistical methodology books for 
publication in this forum. Concise, informative letters which supplement 
previously published reviews by pointing out specific strengths, weaknesses, 
errors, and errata in currently used books are wanted. Criticisms based on 
actual use of a book as a text are especially desired from statistics instruc- 
tors. Other letters may consist of suggestions for the writing of books and 
reviews. Letters which contain adverse criticisms of JOURNAL reviews will 
be submitted to the author of the review for any reply he may care to make. 
Contributors are requested to avoid personalities. The right to decide whether 
a letter merits publication is reserved. Letters should be sent to the review 
editor, Oscar K. Buros, Rutgers University, New Brunswick, N. J. 





INDUSTRIAL STATISTICS 


N THE September 1946 issue of this 

JOURNAL, John W. Tukey and 
Charles P. Winsor published a letter 
about my textbook Industrial Statis- 
tics (John Wiley and Sons, New York, 
1942). I wish to comment on many of 
the passages in this letter. 

Tukey & Winsor: After outlining the use 

of at test after an F test, Freeman points 

out that the difference between the larg- 
est and the smallest will often appear 

“significant” due solely to error, and then 

warns the reader briefly that “A ¢ test 

applied to two means after over-all 
homogeneity has either been refuted or 
not must be used with caution.” ... Yet 
on the next page he uses the ¢ test on 
the best and second best without com- 
ment! In noticing this point we do not 
wish to imply that a good solution exists 

—we know of none—but we feel that the 

reader should have been warned. 

First, the warning is on the preceding 
page and readers would, I thought, re- 
call and apply it without a further re- 
minder. Second, the warning may not 
need to be stressed here, for the ¢ test 
applied to a difference of adjacent 
means is probably a conservative test. 
(Conservative tests are not, of course, 
necessarily good tests.) 

T & W:... it is not until page 87 that 
we learn that the author does not want 
to use the L, test unless the sample sizes 
are equal. What is one supposed to do 
with two unequal samples? (Why not use 
the L, test?) 

For unequal sample sizes no tables of 
I, were available; moreover, for this 
case, the L, test is biased. Finally, the 
accuracy of the average sample size 
method proposed by Nayer is poor. I 
would now use Bartlett’s test for un- 
equal (and equal) sample sizes. 


348 


T & W: On page 87 Freeman states the 
question which the Lo, L; and F tests 
are supposed to answer, but he makes 
no statement of the very important fact 
that these tests assume that the different 
samples came from normal populations 
whether or not the hypothesis tested is 
true. 


With respect to these tests the as- 
sumption of normality was stated on 
page 17, and on pages 87-90. 


T & W: On page 24 we find that lack 
of statistical significance implies that 
“Hardness is really affected by stain, 
whereas breaking (bending) strength is 
not.” That is, from lack of statistical sig- 
nificance we conclude lack of real effect— 
this seems highly unsound! In the actual 
situation, it is extremely probable that 
whatever chemical action is involved in 
staining does have some average effect 
on the bending strength. While the data 
probably do support the conclusion that 
the effect of stain on bending strength is 
unimportant in practice, it is very im- 
portant to distinguish between this con- 
clusion and that stated on page 24. 


The two conclusions are identical. I 
should have been more explicit here 
and my choice of the word “real” was 
a poor one; but it should be evident to 
anyone that stain probably has some 
effect on strength—this would be so 
even had we found t=0—and that I 
could be speaking only of effects that 
are “unimportant in practice.” (Nor 
should it be inferred that I believe that 
if a difference is statistically significant, 
it is important in practice.) 

T & W: On page 25, we have a sample 

of 5 and a firm decision not to make a test 

of normality. We conclude that Freeman 
would stop testing somewhere between 

15 and 5 observations. 
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As noted in the textbook (in a pas- 
sage quoted by Tukey and Winsor), 
tests for normality applied to small 
samples are sensitive only to very large 
departures from normality. 

T & W: The student may easily conclude 

that in samples of 50 or more it is useless 

to test normality, since it does not matter. 

We would hold, on the contrary, that 

samples of more than 50 are needed to 

make a test of normality which is sensi- 
tive enough to give any comfortable as- 
surance of sufficient normality to warrant 
the use of the common tests at even the 
5 per cent level. 


This view should be accepted at a 
discount. In a recent address (to be 
published in the Biometrics Bulletin), 
W. G. Cochran summarized the exten- 
sive literature on the effects of non- 
normality on the validity of the F and t 
tests as follows: 

The consensus from these investigations 
is that no serious error is introduced by 
novnormality in the significance levels of 
the F test or of the two-tailed ¢ test. 
While it is difficult to generalize about 
the range of populations that were in- 
vestigated, this appears to cover most 
cases encountered in practice. If a guess 
may be made about the limits of error, 
the true probability corresponding to 
the tabular 5 per cent significance level 
may lie between 4 and 7 per cent. For 
the 1 per cent level, the limits might be 
taken as $ per cent and 2 per cent. Asa 
rule, the tabular probability is an under- 
estimate: that is by using the ordinary F 
and ¢t tables we tend to err in the direc- 
tion of announcing too many significant 
results. 
The one-tailed ¢ test is more vulnerable. 
With a markedly skew distribution of 
errors, where one tail is much longer 
than the other, the usual practice of 
calculating the significance probability 
as one-half the value read from the 
tables may give quite a serious over- or 
under-estimate. 

T & W: In practice, evidence about 

normality or its lack comes from broad 

experience or other samples. 


In a series of experiments, each ex- 


periment need not, and probably 
should not, stand only on its own feet, 
but generally speaking, I believe that 
most experimenters will get into more 
trouble from reliance on “broad expe- 
rience or other samples” than from the 
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use of any formal test for normality at 

any sample size level. 
T & W: On pages 37-38 Freeman de- 
velops a more or less conventional deriva- 
tion of the normal distribution from 
mauy contributing effects. There is no 
indication that these effects must be 
additive. (Clearly if additive effects 
make z normally distributed, then 
multiplicative effects make e* non- 
normal!) 


This is a good point and a neat ex- 
ample, but my derivation clearly states 
the condition of additivity. The word 
“must” probably should not be used; 
many distribution functions of non- 
additive elements are normal or 
asymptotically normal, although, ad- 
mittedly, different proofs are required. 

T & W: On page 72 we are told that “The 
F test, as used in the analysis of vari- 
ance, is essentially the ratio of variability 
associated with a suspected cause to 
error.” This is, it seems to us, exceedingly 
liable to misinterpretation—only if the 
reader really understands the analysis of 
variance in advance can he interpret the 
word “associated” safely. 

The word “associated” is preceded 
by twenty pages of discussion of analy- 
sis of variance, including nine examples 
worked out in detail. 

T & W: Freeman is then able, by some 
principle not clear to us, to classify to- 
gether under “among cylinders” (page 86) 
terms with mean squares of 4566, 1922, 
and 183—by the size of their mean 
squares alone they are of vastly different 
nature. Freeman seems to imply that to 
interpret an interaction term it should 
be classified under one or the other main 
effects, a procedure which seems to us 
contrary to the basic principles of the 
analysis of variance. 


It would seem so to me, too. The re- 
viewers’ discussion of my analysis of 
the Hampton-Gould data is defective. 
My purpose was to show how to per- 
form formal analysis of variance, since 
few other textbooks of which I knew 
had done so; that is, how to analyze a 
k-factor experiment into 2*—1 main 
order and interaction effects. This I 
did. I then listed terms in such a way as 
to reach the table Tippett discusses at 
length, and I referred the reader to 
Tippett for this discussion. 

The terms with mean squares of 
4566, 1922, and 182 are not of “vastly 
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different nature”; in fact, they are of 
similar nature, and they are listed un- 
der “among cylinders” because, from 
the point of view of proper engineering 
classification, this is where they belong. 
They are ot significantly different mag- 
nitude and are, therefore, never com- 
bined. 


T & W: On page 74... all the F critical 
values are incorrect—the degrees of free- 
dom have been permuted in using the 
table. The correct critical value for 
“interaction vs. error” is about 2.31. 


Tukey and Winsor were reading 
in the wrong column of the table. 
The correct critical value is about 
2.47, as given in the second printing 
(issued in 1944) of my textbook. 


T & W: On page 56 Freeman states that 
“it is unlikely . . . that just five machines 
would be used and that each would be 
used exactly once with each grid and 
each operator. If these conditions are 
not satisfied, machine effects cannot be 
removed.” Nonorthogonal analysis of 
variance is not easy to analyze or inter- 
pret—but it is an essential tool in de- 
ciphering incomplete experiments. 


Nonorthogonal analysis of variance 
is not entirely suitable material for an 
elementary textbook. 


T & W: On page 67, we are told that the 
use of the Latin Square “wastes” degrees 
of freedom which could otherwise be 
allotted to error. This question arises 
not only for Latin Squares, but in every 
design situation where some degrees of 
freedom have been allotted to some vari- 
able of potential importance or effect. 
It seems to us that no waste is involved; 
if the mean square for this additional 
variable is about the size of the error, the 
average practitioner will, wisely or un- 
wisely, pool these together with no loss 
of degrees of freedom... . 


“Wisely or unwisely” is the question, 
both in theory and practice, and much 
could be said about it. For example, 
pooling coupled with moderate non- 
normality may lead to serious errors in 
inference. Here I should like only to 
record myself as in general sympathy 
with the warning of a recent reviewer 
who wrote as follows: 


In carrying out the analysis of vari- 
ance, the procedure used is to test doubt- 
fully significant mean squares and, if 
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they are not significant at the 5 per cent 
peint, to then pool them into a new 
estimate of error mean square. The pos- 
sible pitfalls in this process require much 
careful consideration.—John W. Tukey, 
this Journal, March 1946, page 126. 


T & W: On page 106 we are told that 
“It is here assumed that the variances of 
all eight columns are equivalent, within 
the limits of chance variation. This as- 
sumption, which may be checked by the 
L, test, must be met for the F test to 
be valid.” It seems to us that any inter- 
pretation that a student is likely to put 
on this is wrong. For example, these mis- 
interpretations seem likely: (a) Unless 
the sample variances ot the eight columns 
meet the LZ, test, the F test is invalid. 
(b) If the population variances of the 
eight columns are nearly enough equal 
to allow the sample to pass the L, test, 
the F test is exact. The proper interpre- 
tation, cf course, would be: (c) the F test 
is only exact when the eight column 
populations are normal with the same 
variance; we may be able to detect large 
deviations from equality in the sample; 
the F test is not badly affected by lack of 
equality; we will not make too many seri- 
ous mistakes if we use the F test, check- 
ing suspicious cases with the L, test 
and watering down our conclusions when 
the lack of homogeneity seems large. 


I am unable to see that this is an im- 
provement over what I wrote on this 
subject (only two sentences of which 
were quoted). Tukey and Winsor are 
substituting one inexact statement— 
and, in my opinion, a lamentably elas- 
tic one—for another. 


T & W: In Shewhart’s original discussion 
he made it clear that he defined control 
economically, not statistically. 


Research into the original meaning 
of the term “control” may not repre- 
sent time well spent, and I believe it is 
true that the meaning of the term has 
changed somewhat over the years. But 
for the record, the only definition of 
“control” given by Shewhart in his 
1931 treatise, Economic Control of 
Quality of Manufactured Product, is as 
follows (p. 6; the italics are Shew- 
hart’s): 


Definition of Control 

For our present purpose a phenomenon 
will be said to be controlled when, through 
the use of past experience, we can pre 
dict, at least within limits, how the phe- 
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nomenon may be expected to vary in the 
future. Here it is understood that pre- 
diclion within limits means that we can 
stale, at least approximately, the proba- 
bility that the observed phenomenon will 
fall within the given limits. 


T & W: Control meant that it was not 
economic to try to remove further causes 
of variation, which would almost cer- 
tainly exist and cause nonrandomness in 
any practical state of control. The con- 
trol chart procedure was developed em- 
pirically to help to attain this state— 
and was to be used both before and after 
control was reached. Yet on page 133 we 
are told that the type of “systematic 
quality control” discussed in this chapter 
cannot be used until population homo- 
geneity has been reached. On this internal 
evidence, then, these methods are only 
useful for the, possibly uneconomic, re- 
finement of a quality control program 
set up by more useful methods. 


On page 133 I wrote as follows: 
“More often than not, in industrial 
practice, population homogeneity will 
be achieved only after months of ef- 
fort, and a quality control program of 
the kind suggested in this chapter will 
not be immediately possible. In such 
cases the statistician can best serve by 
assisting in the design and analysis of 
experiments which aim to identify the 
cause of non-homogeneity.” 

A similar point of view was held (last 
year) by one of the reviewers. In his re- 
view of Brownlee’s Jndustrial Experi- 
mentation (this JOURNAL, March 1946, 
pp. 125-127), Tukey wrote: 


The brief discussion of control chart pro- 
cedures gives little guidance in choosing 
between control chart and Fisherian 
methods in a specific instance, yet many 
of the examples which the author treats 
simply and directly would be difficult or 
impossible to treat by control chart 
methods. The control chart procedure 
was designed and is fitted to study 
processes which are or should be in 
control—that is, the successive values 
resemble a random sample. The Fish- 
erian methods were designed and are 
fitted to study processes which are not 
in control, but where auxiliary data or 
the recurrence of the uncontrolled effects 
allows the partial separation of different 
effects. This distinction is fundamental. 


There is little I can now say about 
suggestions for additicnal examples, 
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though if I planned to rewrite this text- 
book, I shouid accept several such sug- 
gestions. I have no comments on criti- 
cisms of two examples in my chapter on 
regression, on which I agree with my 
critics. I also agree that multiple tests 
of significance are important; my text- 
book contains about as much informa- 
tion on them as most other textbooks 
—that is to say, none. An important 
reason for this deficiency is that little 
was known about multiple tests in 1941 
(and not much more is known in 1947). 

I have no comment to make on such 
varied matters as the inevitable Cau- 
chy distribution; the reviewers’ gen- 
eral indictment of unnamed “eminent 
statisticians” from whom I have bor- 
rowed; on such phrases as “we are re- 
luctant to conclude” when no such con- 
clusion, reluctant or eager, need be 
drawn; on my failure to include certain 
lengthy and difficult topics, such as es- 
timating components of variance, in an 
already lengthy one-semester textbook 
(which omissions the reviewers con- 
sider “particularly unfortunate”); nor 
on nearly a column of criticism which 
reveais that in discussing a particular 
test (in an elementary textbook), I 
failed to consider one of the five possi- 
ble cases, which case, the reviewers re- 
mark, occurs infrequently and when it 
does occur, the problem of drawing a 
conclusion is unsettled. 

I should like to state my present 
views on some aspects of industrial sta- 
tistics. 

a) I believe that small samples gen- 
erally have to be examined by non- 
parametric tests in which weak or no 
assumptions are made regarding the 
form of the population, though the gen- 
eral problem of choosing between effi- 
cient tests with strong conditions 
(which the data may not quite satisfy) 
and inefficient tests with weak condi- 
tions (which the data do satisfy) is un- 
settled. 

b) Analysis of variance seems to me 
to be an overworked technique in ex- 
perimental science. Most of the multi- 
variate problems I have seen are con- 
cerned with the experimental determi- 
nation of that combination of values of 
certain variables which will yield max- 
ima or minima in other variables. Fac- 
torial designs may be very wasteful 
here, for in them much of the experi- 
mental data may be devoted to combi- 
nations that are far from optimal. This 
problem has, however, hardly been 
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touched theoretically.! Finally, when- 
ever analysis of variance is used, it is 
well to consider the power function of 
the test. 

c) I suggest very tentatively that 
Shewhart quality control charts might 
be discussed from the point of view of 
the theory of testing hypotheses, with 
emphasis on the two kinds of errors or, 
possibly, on the power function of con- 
trol chart tests. I used to feel rather 
strongly that this provided a modern 
statistical foundation for Shewhart’s 
excellent methods; but this idea may 
not be such a good one as I originally 
supposed. 

d) Any adequate chapter on sam- 
pling inspection will need to be much 
different from those now in print. The 
development, principally by the Sta- 
tistical Research Group during the 
war, of sequential sampling plans for 
attributes and variables, the extended 
use of noncentral t, and the sequential- 
izing of this important statistic (which 
gives us, for the first time, sampling in- 
spection plans for variables without 
nuisance parameters), the populariza- 
tion of operating characteristic and av- 
erage sample size curves as criteria for 
judging sampling plans, and the large 
number of new sampling tables and 
charts need be considered. 

e) Discriminant functions, which are 
being regularly used by some industrial 
experimenters, should be considered. 

Jf) The problem of tolerances is very 
important, for, generally speaking, 
tight tolerances cause unnecessary in- 
dustrial expense. The problem has been 
solved for special cases. The general 
problem is this: given complete infor- 
mation on the physical nature and eco- 
nomic cost of an assembly and that the 
assembly should operate within toler- 


1 See, however, Chapter 13, “Planning Experi- 
ments Seeking Maxima,” by Milton Friedman 
and L. J. Savage, in Techniques of Statistical Anal- 
ysis, a forthcoming book by the Statistical Re- 
search Group, Columbia University; New York: 
McGraw-Hill Book Company, Inc. 
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ances +kofastandard A, what should 
be the tolerances on each of the com- 
ponent parts in order that, with ran- 
dom assembly, the proportion of as- 
semblies with tolerances within +k of 
A will be 1 —é? 

g) In applications, testing of hypoth- 
eses that are often practically meaning- 
less, as, for example, some null hy- 
potheses, is much less useful than in- 
terval estimation. In current textbooks, 
far more attention is given to the 
former subject than to the latter. It 
seems desirable that in the future this 
proportion of space be reversed. 

h) The chief weakness in all of mod- 
ern industrial statistics is the absence 
of detailed economic considerations. 
The significance of results obtained 
from industrial experimentation, the 
selection of sampling inspection plans, 
the setting of control chart limits, and 
practically everything else in indus- 
trial statistics involve costs, selling 
prices, and the like, but these are sel- 
dom found in the discussion. On con- 
trol charts it has been the easy practice 
to use whatever probabilities someone 
else has found or thought to be eco- 
nomic in his own work. This practice 
needs to be terminated. But before ac- 
counting data can be effectively used, 
it will be necessary to provide a gen- 
eral theory which incorporates the re- 
lationship of economic quantities to 
such statistical quantities as _ the 
amount of sampling, level of signifi- 
cance, power of a test, and the like. 
Such a theory is almost entirely un- 
available at this time. It is not obvious 
that it is the job of statisticians to pro- 
vide this theory, but it must be pro- 
vided if the field of industrial statistics 
is to rest on secure foundations. 

Finally, I want to express to Dr. 
Tukey and Dr. Winsor my apprecia- 
tion of the detailed consideration they 
have given my book. 

H. A. FREEMAN, Associate Profes- 
sor of Statistics, Massachusetts 
Institute of Technology 
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